From b11f87f1929ddfbba2abed3bb4d819556f8ae8f3 Mon Sep 17 00:00:00 2001
From: "Myles C. Maxfield"
Date: Thu, 15 Jun 2023 19:23:01 -0700
Subject: [PATCH 1/3] Add word-break:auto
---
css-text-4/Overview.bs | 178 +++++++----------------------------------
1 file changed, 29 insertions(+), 149 deletions(-)
diff --git a/css-text-4/Overview.bs b/css-text-4/Overview.bs
index 30557e5f228..c2a44426890 100644
--- a/css-text-4/Overview.bs
+++ b/css-text-4/Overview.bs
@@ -992,7 +992,7 @@ Detecting Word Boundaries: the 'word-boundary-detection' property
Name: word-boundary-detection
- Value: normal | manual | auto(<>)
+ Value: normal | manual
Initial: normal
Applies to: text
Inherited: yes
@@ -1110,151 +1110,8 @@ Detecting Word Boundaries: the 'word-boundary-detection' property
when a text run is composed of Khmer characters (U+1780 to U+17FF)
if the user agent does not know how to determine
word boundaries in Khmer.
-
- auto(<>)
-
- This value directs the user agent to perform language-specific content analysis
- to determine where to insert [=virtual word boundaries=].
-
- <> must be a valid CSS <> or <>.
- It represents an IETF BCP 47 language range
- (see [[BCP47]]).
- If the UA does not support word-boundary detection
- for all languages represented by the specified range,
- that specified value is invalid
- (and will cause the declaration to be ignored).
-
-
- word-boundary/word-boundary-105.html
- word-boundary/word-boundary-106.html
- word-boundary/word-boundary-107.html
- word-boundary/word-boundary-108.html
- word-boundary/word-boundary-109.html
- word-boundary/word-boundary-110.html
- word-boundary/word-boundary-111.html
- word-boundary/word-boundary-112.html
- word-boundary/word-boundary-113.html
- word-boundary/word-boundary-114.html
- word-boundary/word-boundary-115.html
- word-boundary/word-boundary-116.html
- word-boundary/word-boundary-117.html
- word-boundary/word-boundary-118.html
- word-boundary/word-boundary-119.html
- word-boundary/word-boundary-120.html
- word-boundary/word-boundary-121.html
- word-boundary/word-boundary-122.html
- word-boundary/word-boundary-123.html
- word-boundary/word-boundary-124.html
- word-boundary/word-boundary-125.html
- word-boundary/word-boundary-126.html
- word-boundary/word-boundary-127.html
-
-
- Note: Wildcards in the language subtag would imply
- support for detecting word boundaries in an undefined and effectively unlimited set of languages.
- As this is not possible,
- wildcards in the language subtag always result in the declaration
- being treated as invalid.
-
- Note: Whether a word boundary detection system designed for one language
- is suitable for some or all dialects of that language is somewhat subjective,
- and this specifications leaves it at the discretion of the user agent.
- Even if a detection system is not able to cope with all nuances of a particular dialect,
- it may be reasonable to claim support
- if the detection correctly recognizes word boundaries most of the time.
- However, the user agent would do a disservice to authors and users
- if it claimed support for languages
- where it fails to detect most word boundaries
- or has a high error rate.
-
- If the element’s [=content language=],
- as represented in BCP 47 syntax [[BCP47]],
- does not match the language range described by the computed value's <>
- in an extended filtering operation
- per [[RFC4647]] Matching of Language Tags (section 3.3.2)
- with both the [=content language=] and <>
- then the [=used value=] is ''word-boundary-detection/normal'',
- and this property has no effect on this element.
- Otherwise,
- the user agent must insert a [=virtual word boundary=]
- at each detected word boundary
- within the [=text sequence=] children of this element.
- Within the constraints set by this specification,
- the specific algorithm used is UA-dependent.
-
-
- word-boundary/word-boundary-105.html
- word-boundary/word-boundary-106.html
- word-boundary/word-boundary-107.html
- word-boundary/word-boundary-108.html
- word-boundary/word-boundary-109.html
- word-boundary/word-boundary-110.html
- word-boundary/word-boundary-111.html
- word-boundary/word-boundary-112.html
-
-
- Note: This is the same matching logic as the one used for the '':lang()'' selector.
-
- If a user agent has a word-boundary detection system for Cantonese
- that is not suitable for the broader set of Chinese languages,
- it is expected to accept ''auto(yue)'', ''auto(zh-yue)'', or ''auto(zh-HK)'',
- but not ''auto(zh)'' or ''auto(zh-Hant)''.
-
- However, if the user agent supports a generic word-boundary detection system
- that is suitable for Chinese in general,
- it is expected to accept the broad ''auto(zh)'' characterization,
- as well as any more specific ones,
- such as ''auto(zh-yue)'', ''auto(zh-Hant-HK)'', ''auto(zh-Hans-SG)'', or ''auto(zh-hak)''.
-
-
-
- Specifying the language for which the word boundary detection is to be performed
- and making unsupported language ranges invalid
- is required in order to make this feature meaningfully testable with ''@supports''.
-
- For example, Japanese text normally allows line breaking between letters of a word
- (see ''word-break: normal'').
- The following code disables that in
h1 elements,
- and only allows line breaking at autodetected word boundaries instead,
- without requiring the author to manually indicate word boundaries in the markup.
- However, if word boundary detection is not supported for Japanese,
- this change is not applied,
- as ''word-break: keep-all'' could remove all [=soft wrap opportunities=] from the element,
- and risk causing overflow.
-
- @supports (word-boundary-detection: auto(ja)) {
- h1:lang(ja) {
- word-boundary-detection: auto(ja);
- word-break: keep-all;
- }
- }
-
-
-
- User agents may activate
- language-specific content analysis
- in response to user preferences.
- User agents with this behavior must do this
- by setting the [=declared value=] of 'word-boundary-detection' to ''word-boundary-detection/auto(<>)''
- in the [=User Origin=].
- User agents that do not support the [=User Origin=]
- may use the [=User-Agent Origin=] instead.
-
-
- Manual analysis of the content can be more reliable than UA heuristics.
- For best results, authors who can perform this analysis are encouraged to markup their documents
- using <{wbr}> or U+200B
- to exhaustively indicate word boundaries.
-
- Authors who prepare their content in this manner
- should not rely on the initial value, and
- should explicitly specify ''word-boundary-detection: manual'' on the relevant parts of the content,
- in order to override a potential ''word-boundary-detection: auto(<>)''
- in the [=User Origin=] or [=User-Agent Origin=].
-
-
[=Virtual word boundary=] insertion happens before [[CSS-TEXT-3#white-space-phase-1]]
and before [[#word-boundary-expansion]].
Later operations
@@ -1344,10 +1201,6 @@ Detecting Word Boundaries: the 'word-boundary-detection' property
word-boundary/word-boundary-118.html
- Note: This implies that for languages such as English
- where words are separated by spaces or other separating characters,
- ''word-boundary-detection/auto()'' has no effect.
-
between characters that compose a single [=typographic character unit=].
@@ -4334,7 +4187,7 @@ Breaking Rules for Letters: the 'word-break' property
Name: word-break
- Value: normal | keep-all | break-all | break-word
+ Value: normal | keep-all | break-all | break-word | auto
Initial: normal
Applies to: text
Inherited: yes
@@ -4603,8 +4456,35 @@ Breaking Rules for Letters: the 'word-break' property
(which uses [=spaces=] between words),
and is also useful for mixed-script text where CJK snippets are mixed
into another language that uses [=spaces=] for separation.
+
+
auto
+
+ This value directs the user agent to perform language-specific content analysis
+ to determine where to insert [=virtual word boundaries=].
+
+ User agents may activate
+ language-specific content analysis
+ in response to user preferences.
+ User agents with this behavior must do this
+ by setting the [=declared value=] of 'word-break' to ''word-break/auto''
+ in the [=User Origin=].
+ User agents that do not support the [=User Origin=]
+ should use the [=User-Agent Origin=] instead.
+
+ Manual analysis of the content can be more reliable than UA heuristics.
+ For best results, authors who can perform this analysis are encouraged to markup their documents
+ using <{wbr}> or U+200B
+ to exhaustively indicate word boundaries.
+
+ Authors who prepare their content in this manner
+ should not rely on the initial value, and
+ should explicitly specify ''word-break'' on the relevant parts of the content,
+ in order to override a potential ''word-break: auto''
+ in the [=User Origin=] or [=User-Agent Origin=].
+
+
Symbols that line-break the same way as letters of a particular category
are affected the same way as those letters.
From 488bc99bdafbcebaa35512c86305b8771d642937 Mon Sep 17 00:00:00 2001
From: "Myles C. Maxfield"
Date: Sat, 17 Jun 2023 14:55:22 -0700
Subject: [PATCH 2/3] Address @kojiishi's review feedback
---
css-text-4/Overview.bs | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/css-text-4/Overview.bs b/css-text-4/Overview.bs
index c2a44426890..3d9dadce101 100644
--- a/css-text-4/Overview.bs
+++ b/css-text-4/Overview.bs
@@ -4187,7 +4187,7 @@ Breaking Rules for Letters: the 'word-break' property
Name: word-break
- Value: normal | keep-all | break-all | break-word | auto
+ Value: normal | keep-all | break-all | break-word | phrase
Initial: normal
Applies to: text
Inherited: yes
@@ -4457,19 +4457,25 @@ Breaking Rules for Letters: the 'word-break' property
and is also useful for mixed-script text where CJK snippets are mixed
into another language that uses [=spaces=] for separation.
-
auto
+ phrase
This value directs the user agent to perform language-specific content analysis
- to determine where to insert [=virtual word boundaries=].
+ to prioritize keeping natural phrases (of multiple words) together
+ to determine [=soft wrap opportunities=]
+ and [=virtual word boundaries=].
User agents may activate
language-specific content analysis
in response to user preferences.
User agents with this behavior must do this
- by setting the [=declared value=] of 'word-break' to ''word-break/auto''
+ by setting the [=declared value=] of 'word-break' to ''word-break/phrase''
in the [=User Origin=].
User agents that do not support the [=User Origin=]
should use the [=User-Agent Origin=] instead.
+
+ Note: This means that web content can detect whether or not this feature is
+ enabled by calling getComputedStyle(), even if the user agent enabled this
+ feature by default.
From 25877e0351d825dd7602f852b43e7fa6a0d7b33c Mon Sep 17 00:00:00 2001
From: Koji Ishii
Date: Tue, 22 Aug 2023 17:49:14 +0900
Subject: [PATCH 3/3] Address WG resolutions and fantasai's feedback
Addresses @fatnasai's feedback at:
https://github.com/w3c/csswg-drafts/pull/8974
and the WG resolutions at:
https://github.com/w3c/csswg-drafts/issues/7193#issuecomment-1663101848
---
css-text-4/Overview.bs | 76 ++++++++++++++++++++++++++----------------
1 file changed, 48 insertions(+), 28 deletions(-)
diff --git a/css-text-4/Overview.bs b/css-text-4/Overview.bs
index 3d9dadce101..97f4785dd0c 100644
--- a/css-text-4/Overview.bs
+++ b/css-text-4/Overview.bs
@@ -4187,7 +4187,7 @@ Breaking Rules for Letters: the 'word-break' property
Name: word-break
- Value: normal | keep-all | break-all | break-word | phrase
+ Value: normal | keep-all | break-all | break-word | auto-phrase
Initial: normal
Applies to: text
Inherited: yes
@@ -4457,39 +4457,44 @@ Breaking Rules for Letters: the 'word-break' property
and is also useful for mixed-script text where CJK snippets are mixed
into another language that uses [=spaces=] for separation.
-
phrase
+ auto-phrase
- This value directs the user agent to perform language-specific content analysis
+ This value directs the user agent
+ to perform language-specific content analysis
to prioritize keeping natural phrases (of multiple words) together
to determine [=soft wrap opportunities=]
and [=virtual word boundaries=].
- User agents may activate
- language-specific content analysis
- in response to user preferences.
- User agents with this behavior must do this
- by setting the [=declared value=] of 'word-break' to ''word-break/phrase''
- in the [=User Origin=].
- User agents that do not support the [=User Origin=]
- should use the [=User-Agent Origin=] instead.
-
- Note: This means that web content can detect whether or not this feature is
- enabled by calling getComputedStyle(), even if the user agent enabled this
- feature by default.
-
+ If the user agent doesn't support the [=content language=],
+ the [=used value=] must fallback to the ''word-break: normal''.
-
- Manual analysis of the content can be more reliable than UA heuristics.
- For best results, authors who can perform this analysis are encouraged to markup their documents
- using <{wbr}> or U+200B
- to exhaustively indicate word boundaries.
-
- Authors who prepare their content in this manner
- should not rely on the initial value, and
- should explicitly specify ''word-break'' on the relevant parts of the content,
- in order to override a potential ''word-break: auto''
- in the [=User Origin=] or [=User-Agent Origin=].
-
+ Manually inserted [=soft wrap opportunities=] must be honored,
+ such as ones created by <{wbr}>, or the
+ ZW
+ line breaking class [[!UAX14]].
+ Manually forbidden [=soft wrap opportunities=] must be honored too,
+ such as ones done by the 'white-space' property, or by the
+ GL or
+ ZWJ
+ line breaking class [[!UAX14]].
+
+ When a phrase is too long to fit in a line and triggers overflow,
+ its [=used value=] must fallback to the ''word-break: normal''.
+
+
+ Manual analysis of the content can be
+ more reliable than UA heuristics.
+ For best results, authors who can perform this analysis
+ are encouraged to markup their documents
+ using <{wbr}> or U+200B ZERO WIDTH SPACE
+ to exhaustively indicate phrase boundaries.
+
+
+ Note: User agents may activate this value
+ in response to user preferences.
+ See Accessibility Features
+ for Phrase Line Breaking for more details.
+
Symbols that line-break the same way as letters of a particular category
are affected the same way as those letters.
@@ -4632,6 +4637,21 @@ Breaking Rules for Letters: the 'word-break' property
white-space/break-spaces-with-ideographic-space-009.html
+
+Accessibility Features for Phrase Line Breaking
+
+ User agents may activate
+ language-specific content analysis
+ described in ''word-break/auto-phrase''
+ in response to user preferences.
+ User agents with this behavior must do this
+ by setting the [=declared value=] of 'word-break' to ''word-break/auto-phrase''
+ in the [=User Origin=].
+
+ Note: This means that web content can detect whether or not this feature is
+ enabled by calling getComputedStyle(), even if the user agent enabled this
+ feature by default.
+
Line Breaking Strictness: the 'line-break' property