From b11f87f1929ddfbba2abed3bb4d819556f8ae8f3 Mon Sep 17 00:00:00 2001 From: "Myles C. Maxfield" Date: Thu, 15 Jun 2023 19:23:01 -0700 Subject: [PATCH 1/3] Add word-break:auto --- css-text-4/Overview.bs | 178 +++++++---------------------------------- 1 file changed, 29 insertions(+), 149 deletions(-) diff --git a/css-text-4/Overview.bs b/css-text-4/Overview.bs index 30557e5f228..c2a44426890 100644 --- a/css-text-4/Overview.bs +++ b/css-text-4/Overview.bs @@ -992,7 +992,7 @@ Detecting Word Boundaries: the 'word-boundary-detection' property
 	Name: word-boundary-detection
-	Value: normal | manual | auto(<>)
+	Value: normal | manual
 	Initial: normal
 	Applies to: text
 	Inherited: yes
@@ -1110,151 +1110,8 @@ Detecting Word Boundaries: the 'word-boundary-detection' property
 			when a text run is composed of Khmer characters (U+1780 to U+17FF)
 			if the user agent does not know how to determine
 			word boundaries in Khmer.
-
-		
auto(<>) -
- This value directs the user agent to perform language-specific content analysis - to determine where to insert [=virtual word boundaries=]. - - <> must be a valid CSS <> or <>. - It represents an IETF BCP 47 language range - (see [[BCP47]]). - If the UA does not support word-boundary detection - for all languages represented by the specified range, - that specified value is invalid - (and will cause the declaration to be ignored). - - - word-boundary/word-boundary-105.html - word-boundary/word-boundary-106.html - word-boundary/word-boundary-107.html - word-boundary/word-boundary-108.html - word-boundary/word-boundary-109.html - word-boundary/word-boundary-110.html - word-boundary/word-boundary-111.html - word-boundary/word-boundary-112.html - word-boundary/word-boundary-113.html - word-boundary/word-boundary-114.html - word-boundary/word-boundary-115.html - word-boundary/word-boundary-116.html - word-boundary/word-boundary-117.html - word-boundary/word-boundary-118.html - word-boundary/word-boundary-119.html - word-boundary/word-boundary-120.html - word-boundary/word-boundary-121.html - word-boundary/word-boundary-122.html - word-boundary/word-boundary-123.html - word-boundary/word-boundary-124.html - word-boundary/word-boundary-125.html - word-boundary/word-boundary-126.html - word-boundary/word-boundary-127.html - - - Note: Wildcards in the language subtag would imply - support for detecting word boundaries in an undefined and effectively unlimited set of languages. - As this is not possible, - wildcards in the language subtag always result in the declaration - being treated as invalid. - - Note: Whether a word boundary detection system designed for one language - is suitable for some or all dialects of that language is somewhat subjective, - and this specifications leaves it at the discretion of the user agent. - Even if a detection system is not able to cope with all nuances of a particular dialect, - it may be reasonable to claim support - if the detection correctly recognizes word boundaries most of the time. - However, the user agent would do a disservice to authors and users - if it claimed support for languages - where it fails to detect most word boundaries - or has a high error rate. - - If the element’s [=content language=], - as represented in BCP 47 syntax [[BCP47]], - does not match the language range described by the computed value's <> - in an extended filtering operation - per [[RFC4647]] Matching of Language Tags (section 3.3.2) - with both the [=content language=] and <> - then the [=used value=] is ''word-boundary-detection/normal'', - and this property has no effect on this element. - Otherwise, - the user agent must insert a [=virtual word boundary=] - at each detected word boundary - within the [=text sequence=] children of this element. - Within the constraints set by this specification, - the specific algorithm used is UA-dependent. - - - word-boundary/word-boundary-105.html - word-boundary/word-boundary-106.html - word-boundary/word-boundary-107.html - word-boundary/word-boundary-108.html - word-boundary/word-boundary-109.html - word-boundary/word-boundary-110.html - word-boundary/word-boundary-111.html - word-boundary/word-boundary-112.html - - - Note: This is the same matching logic as the one used for the '':lang()'' selector. -
- If a user agent has a word-boundary detection system for Cantonese - that is not suitable for the broader set of Chinese languages, - it is expected to accept ''auto(yue)'', ''auto(zh-yue)'', or ''auto(zh-HK)'', - but not ''auto(zh)'' or ''auto(zh-Hant)''. - - However, if the user agent supports a generic word-boundary detection system - that is suitable for Chinese in general, - it is expected to accept the broad ''auto(zh)'' characterization, - as well as any more specific ones, - such as ''auto(zh-yue)'', ''auto(zh-Hant-HK)'', ''auto(zh-Hans-SG)'', or ''auto(zh-hak)''. -
- -
- Specifying the language for which the word boundary detection is to be performed - and making unsupported language ranges invalid - is required in order to make this feature meaningfully testable with ''@supports''. - - For example, Japanese text normally allows line breaking between letters of a word - (see ''word-break: normal''). - The following code disables that in h1 elements, - and only allows line breaking at autodetected word boundaries instead, - without requiring the author to manually indicate word boundaries in the markup. - However, if word boundary detection is not supported for Japanese, - this change is not applied, - as ''word-break: keep-all'' could remove all [=soft wrap opportunities=] from the element, - and risk causing overflow. -

-		@supports (word-boundary-detection: auto(ja)) {
-			h1:lang(ja) {
-				word-boundary-detection: auto(ja);
-				word-break: keep-all;
-			}
-		}
-		
-
- - User agents may activate - language-specific content analysis - in response to user preferences. - User agents with this behavior must do this - by setting the [=declared value=] of 'word-boundary-detection' to ''word-boundary-detection/auto(<>)'' - in the [=User Origin=]. - User agents that do not support the [=User Origin=] - may use the [=User-Agent Origin=] instead. - -
- Manual analysis of the content can be more reliable than UA heuristics. - For best results, authors who can perform this analysis are encouraged to markup their documents - using <{wbr}> or U+200B - to exhaustively indicate word boundaries. - - Authors who prepare their content in this manner - should not rely on the initial value, and - should explicitly specify ''word-boundary-detection: manual'' on the relevant parts of the content, - in order to override a potential ''word-boundary-detection: auto(<>)'' - in the [=User Origin=] or [=User-Agent Origin=]. -
- [=Virtual word boundary=] insertion happens before [[CSS-TEXT-3#white-space-phase-1]] and before [[#word-boundary-expansion]]. Later operations @@ -1344,10 +1201,6 @@ Detecting Word Boundaries: the 'word-boundary-detection' property word-boundary/word-boundary-118.html - Note: This implies that for languages such as English - where words are separated by spaces or other separating characters, - ''word-boundary-detection/auto()'' has no effect. -
  • between characters that compose a single [=typographic character unit=]. @@ -4334,7 +4187,7 @@ Breaking Rules for Letters: the 'word-break' property
     	Name: word-break
    -	Value: normal | keep-all | break-all | break-word
    +	Value: normal | keep-all | break-all | break-word | auto
     	Initial: normal
     	Applies to: text
     	Inherited: yes
    @@ -4603,8 +4456,35 @@ Breaking Rules for Letters: the 'word-break' property
     			(which uses [=spaces=] between words),
     			and is also useful for mixed-script text where CJK snippets are mixed
     			into another language that uses [=spaces=] for separation.
    +
    +		
    auto +
    + This value directs the user agent to perform language-specific content analysis + to determine where to insert [=virtual word boundaries=]. + + User agents may activate + language-specific content analysis + in response to user preferences. + User agents with this behavior must do this + by setting the [=declared value=] of 'word-break' to ''word-break/auto'' + in the [=User Origin=]. + User agents that do not support the [=User Origin=] + should use the [=User-Agent Origin=] instead. +
    + Manual analysis of the content can be more reliable than UA heuristics. + For best results, authors who can perform this analysis are encouraged to markup their documents + using <{wbr}> or U+200B + to exhaustively indicate word boundaries. + + Authors who prepare their content in this manner + should not rely on the initial value, and + should explicitly specify ''word-break'' on the relevant parts of the content, + in order to override a potential ''word-break: auto'' + in the [=User Origin=] or [=User-Agent Origin=]. +
    + Symbols that line-break the same way as letters of a particular category are affected the same way as those letters. From 488bc99bdafbcebaa35512c86305b8771d642937 Mon Sep 17 00:00:00 2001 From: "Myles C. Maxfield" Date: Sat, 17 Jun 2023 14:55:22 -0700 Subject: [PATCH 2/3] Address @kojiishi's review feedback --- css-text-4/Overview.bs | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/css-text-4/Overview.bs b/css-text-4/Overview.bs index c2a44426890..3d9dadce101 100644 --- a/css-text-4/Overview.bs +++ b/css-text-4/Overview.bs @@ -4187,7 +4187,7 @@ Breaking Rules for Letters: the 'word-break' property
     	Name: word-break
    -	Value: normal | keep-all | break-all | break-word | auto
    +	Value: normal | keep-all | break-all | break-word | phrase
     	Initial: normal
     	Applies to: text
     	Inherited: yes
    @@ -4457,19 +4457,25 @@ Breaking Rules for Letters: the 'word-break' property
     			and is also useful for mixed-script text where CJK snippets are mixed
     			into another language that uses [=spaces=] for separation.
     
    -		
    auto +
    phrase
    This value directs the user agent to perform language-specific content analysis - to determine where to insert [=virtual word boundaries=]. + to prioritize keeping natural phrases (of multiple words) together + to determine [=soft wrap opportunities=] + and [=virtual word boundaries=]. User agents may activate language-specific content analysis in response to user preferences. User agents with this behavior must do this - by setting the [=declared value=] of 'word-break' to ''word-break/auto'' + by setting the [=declared value=] of 'word-break' to ''word-break/phrase'' in the [=User Origin=]. User agents that do not support the [=User Origin=] should use the [=User-Agent Origin=] instead. + + Note: This means that web content can detect whether or not this feature is + enabled by calling getComputedStyle(), even if the user agent enabled this + feature by default.
    From 25877e0351d825dd7602f852b43e7fa6a0d7b33c Mon Sep 17 00:00:00 2001 From: Koji Ishii Date: Tue, 22 Aug 2023 17:49:14 +0900 Subject: [PATCH 3/3] Address WG resolutions and fantasai's feedback Addresses @fatnasai's feedback at: https://github.com/w3c/csswg-drafts/pull/8974 and the WG resolutions at: https://github.com/w3c/csswg-drafts/issues/7193#issuecomment-1663101848 --- css-text-4/Overview.bs | 76 ++++++++++++++++++++++++++---------------- 1 file changed, 48 insertions(+), 28 deletions(-) diff --git a/css-text-4/Overview.bs b/css-text-4/Overview.bs index 3d9dadce101..97f4785dd0c 100644 --- a/css-text-4/Overview.bs +++ b/css-text-4/Overview.bs @@ -4187,7 +4187,7 @@ Breaking Rules for Letters: the 'word-break' property
     	Name: word-break
    -	Value: normal | keep-all | break-all | break-word | phrase
    +	Value: normal | keep-all | break-all | break-word | auto-phrase
     	Initial: normal
     	Applies to: text
     	Inherited: yes
    @@ -4457,39 +4457,44 @@ Breaking Rules for Letters: the 'word-break' property
     			and is also useful for mixed-script text where CJK snippets are mixed
     			into another language that uses [=spaces=] for separation.
     
    -		
    phrase +
    auto-phrase
    - This value directs the user agent to perform language-specific content analysis + This value directs the user agent + to perform language-specific content analysis to prioritize keeping natural phrases (of multiple words) together to determine [=soft wrap opportunities=] and [=virtual word boundaries=]. - User agents may activate - language-specific content analysis - in response to user preferences. - User agents with this behavior must do this - by setting the [=declared value=] of 'word-break' to ''word-break/phrase'' - in the [=User Origin=]. - User agents that do not support the [=User Origin=] - should use the [=User-Agent Origin=] instead. - - Note: This means that web content can detect whether or not this feature is - enabled by calling getComputedStyle(), even if the user agent enabled this - feature by default. - + If the user agent doesn't support the [=content language=], + the [=used value=] must fallback to the ''word-break: normal''. -
    - Manual analysis of the content can be more reliable than UA heuristics. - For best results, authors who can perform this analysis are encouraged to markup their documents - using <{wbr}> or U+200B - to exhaustively indicate word boundaries. - - Authors who prepare their content in this manner - should not rely on the initial value, and - should explicitly specify ''word-break'' on the relevant parts of the content, - in order to override a potential ''word-break: auto'' - in the [=User Origin=] or [=User-Agent Origin=]. -
    + Manually inserted [=soft wrap opportunities=] must be honored, + such as ones created by <{wbr}>, or the + ZW + line breaking class [[!UAX14]]. + Manually forbidden [=soft wrap opportunities=] must be honored too, + such as ones done by the 'white-space' property, or by the + GL or + ZWJ + line breaking class [[!UAX14]]. + + When a phrase is too long to fit in a line and triggers overflow, + its [=used value=] must fallback to the ''word-break: normal''. + +
    + Manual analysis of the content can be + more reliable than UA heuristics. + For best results, authors who can perform this analysis + are encouraged to markup their documents + using <{wbr}> or U+200B ZERO WIDTH SPACE + to exhaustively indicate phrase boundaries. +
    + + Note: User agents may activate this value + in response to user preferences. + See Accessibility Features + for Phrase Line Breaking for more details. + Symbols that line-break the same way as letters of a particular category are affected the same way as those letters. @@ -4632,6 +4637,21 @@ Breaking Rules for Letters: the 'word-break' property white-space/break-spaces-with-ideographic-space-009.html +

    +Accessibility Features for Phrase Line Breaking

    + + User agents may activate + language-specific content analysis + described in ''word-break/auto-phrase'' + in response to user preferences. + User agents with this behavior must do this + by setting the [=declared value=] of 'word-break' to ''word-break/auto-phrase'' + in the [=User Origin=]. + + Note: This means that web content can detect whether or not this feature is + enabled by calling getComputedStyle(), even if the user agent enabled this + feature by default. +

    Line Breaking Strictness: the 'line-break' property