Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
204 changes: 55 additions & 149 deletions css-text-4/Overview.bs
Original file line number Diff line number Diff line change
Expand Up @@ -992,7 +992,7 @@ Detecting Word Boundaries: the 'word-boundary-detection' property</h4>

<pre class="propdef">
Name: word-boundary-detection
Value: normal | manual | auto(<<lang>>)
Value: normal | manual
Initial: normal
Applies to: text
Inherited: yes
Expand Down Expand Up @@ -1110,151 +1110,8 @@ Detecting Word Boundaries: the 'word-boundary-detection' property</h4>
when a text run is composed of Khmer characters (U+1780 to U+17FF)
if the user agent does not know how to determine
word boundaries in Khmer.

<dt><dfn>auto(<<lang>>)</dfn>
<dd>
This value directs the user agent to perform language-specific content analysis
to determine where to insert [=virtual word boundaries=].

<dfn dfn-type=type><<lang>></dfn> must be a valid CSS <<ident>> or <<string>>.
It represents an IETF BCP 47 language range
(see [[BCP47]]).
If the UA does not support word-boundary detection
for <em>all</em> languages represented by the specified range,
that specified value is invalid
(and will cause the declaration to be ignored).

<wpt>
word-boundary/word-boundary-105.html
word-boundary/word-boundary-106.html
word-boundary/word-boundary-107.html
word-boundary/word-boundary-108.html
word-boundary/word-boundary-109.html
word-boundary/word-boundary-110.html
word-boundary/word-boundary-111.html
word-boundary/word-boundary-112.html
word-boundary/word-boundary-113.html
word-boundary/word-boundary-114.html
word-boundary/word-boundary-115.html
word-boundary/word-boundary-116.html
word-boundary/word-boundary-117.html
word-boundary/word-boundary-118.html
word-boundary/word-boundary-119.html
word-boundary/word-boundary-120.html
word-boundary/word-boundary-121.html
word-boundary/word-boundary-122.html
word-boundary/word-boundary-123.html
word-boundary/word-boundary-124.html
word-boundary/word-boundary-125.html
word-boundary/word-boundary-126.html
word-boundary/word-boundary-127.html
</wpt>

Note: Wildcards <em>in the language subtag</em> would imply
support for detecting word boundaries in an undefined and effectively unlimited set of languages.
As this is not possible,
wildcards in the language subtag always result in the declaration
being treated as invalid.

Note: Whether a word boundary detection system designed for one language
is suitable for some or all dialects of that language is somewhat subjective,
and this specifications leaves it at the discretion of the user agent.
Even if a detection system is not able to cope with all nuances of a particular dialect,
it may be reasonable to claim support
if the detection correctly recognizes word boundaries most of the time.
However, the user agent would do a disservice to authors and users
if it claimed support for languages
where it fails to detect most word boundaries
or has a high error rate.

If the element’s [=content language=],
as represented in BCP 47 syntax [[BCP47]],
does <em>not</em> match the language range described by the computed value's <<lang>>
in an extended filtering operation
per [[RFC4647]] <cite>Matching of Language Tags</cite> (section 3.3.2)
with both the [=content language=] and <<lang>>
then the [=used value=] is ''word-boundary-detection/normal'',
and this property has no effect on this element.
Otherwise,
the user agent must insert a [=virtual word boundary=]
at each detected word boundary
within the [=text sequence=] children of this element.
Within the constraints set by this specification,
the specific algorithm used is UA-dependent.

<wpt>
word-boundary/word-boundary-105.html
word-boundary/word-boundary-106.html
word-boundary/word-boundary-107.html
word-boundary/word-boundary-108.html
word-boundary/word-boundary-109.html
word-boundary/word-boundary-110.html
word-boundary/word-boundary-111.html
word-boundary/word-boundary-112.html
</wpt>

Note: This is the same matching logic as the one used for the '':lang()'' selector.
</dl>

<div class=example>
If a user agent has a word-boundary detection system for Cantonese
that is not suitable for the broader set of Chinese languages,
it is expected to accept ''auto(yue)'', ''auto(zh-yue)'', or ''auto(zh-HK)'',
but not ''auto(zh)'' or ''auto(zh-Hant)''.

However, if the user agent supports a generic word-boundary detection system
that is suitable for Chinese in general,
it is expected to accept the broad ''auto(zh)'' characterization,
as well as any more specific ones,
such as ''auto(zh-yue)'', ''auto(zh-Hant-HK)'', ''auto(zh-Hans-SG)'', or ''auto(zh-hak)''.
</div>

<div class=example>
Specifying the language for which the word boundary detection is to be performed
and making unsupported language ranges invalid
is required in order to make this feature meaningfully testable with ''@supports''.

For example, Japanese text normally allows line breaking between letters of a word
(see ''word-break: normal'').
The following code disables that in <code>h1</code> elements,
and only allows line breaking at autodetected word boundaries instead,
without requiring the author to manually indicate word boundaries in the markup.
However, if word boundary detection is not supported for Japanese,
this change is not applied,
as ''word-break: keep-all'' could remove all [=soft wrap opportunities=] from the element,
and risk causing overflow.
<pre><code class=lang-css>
@supports (word-boundary-detection: auto(ja)) {
h1:lang(ja) {
word-boundary-detection: auto(ja);
word-break: keep-all;
}
}
</code></pre>
</div>

User agents may activate
language-specific content analysis
in response to user preferences.
User agents with this behavior must do this
by setting the [=declared value=] of 'word-boundary-detection' to ''word-boundary-detection/auto(<<lang>>)''
in the [=User Origin=].
User agents that do not support the [=User Origin=]
may use the [=User-Agent Origin=] instead.

<div class=advisement>
Manual analysis of the content can be more reliable than UA heuristics.
For best results, authors who can perform this analysis are encouraged to markup their documents
using <{wbr}> or U+200B
to exhaustively indicate word boundaries.

Authors who prepare their content in this manner
should not rely on the initial value, and
should explicitly specify ''word-boundary-detection: manual'' on the relevant parts of the content,
in order to override a potential ''word-boundary-detection: auto(<<lang>>)''
in the [=User Origin=] or [=User-Agent Origin=].
</div>

[=Virtual word boundary=] insertion happens before [[CSS-TEXT-3#white-space-phase-1]]
and before [[#word-boundary-expansion]].
Later operations
Expand Down Expand Up @@ -1344,10 +1201,6 @@ Detecting Word Boundaries: the 'word-boundary-detection' property</h4>
word-boundary/word-boundary-118.html
</wpt>

Note: This implies that for languages such as English
where words are separated by spaces or other separating characters,
''word-boundary-detection/auto(<lang>)'' has no effect.

<li>
between characters that compose a single [=typographic character unit=].

Expand Down Expand Up @@ -4334,7 +4187,7 @@ Breaking Rules for Letters: the 'word-break' property</h3>

<pre class="propdef">
Name: word-break
Value: normal | keep-all | break-all | break-word
Value: normal | keep-all | break-all | break-word | auto-phrase
Initial: normal
Applies to: text
Inherited: yes
Expand Down Expand Up @@ -4603,6 +4456,44 @@ Breaking Rules for Letters: the 'word-break' property</h3>
(which uses [=spaces=] between words),
and is also useful for mixed-script text where CJK snippets are mixed
into another language that uses [=spaces=] for separation.

<dt><dfn>auto-phrase</dfn>
<dd>
This value directs the user agent
to perform language-specific content analysis
to prioritize keeping natural phrases (of multiple words) together
to determine [=soft wrap opportunities=]
and [=virtual word boundaries=].

If the user agent doesn't support the [=content language=],
the [=used value=] must fallback to the ''word-break: normal''.

Manually inserted [=soft wrap opportunities=] must be honored,
such as ones created by <{wbr}>, or the
<a href="https://unicode.org/reports/tr14/#ZW"><code>ZW</code></a>
line breaking class [[!UAX14]].
Manually forbidden [=soft wrap opportunities=] must be honored too,
such as ones done by the 'white-space' property, or by the
<a href="https://unicode.org/reports/tr14/#GL"><code>GL</code></a> or
<a href="https://unicode.org/reports/tr14/#ZWJ"><code>ZWJ</code></a>
line breaking class [[!UAX14]].

When a phrase is too long to fit in a line and triggers overflow,
its [=used value=] must fallback to the ''word-break: normal''.

<div class=advisement>
Manual analysis of the content can be
more reliable than UA heuristics.
For best results, authors who can perform this analysis
are encouraged to markup their documents
using <{wbr}> or U+200B ZERO WIDTH SPACE
to exhaustively indicate phrase boundaries.
</div>

Note: User agents may activate this value
in response to user preferences.
See <a href="#phrase-ax">Accessibility Features
for Phrase Line Breaking</a> for more details.
</dl>

Symbols that line-break the same way as letters of a particular category
Expand Down Expand Up @@ -4746,6 +4637,21 @@ Breaking Rules for Letters: the 'word-break' property</h3>
white-space/break-spaces-with-ideographic-space-009.html
</wpt>

<h4 id="phrase-ax">
Accessibility Features for Phrase Line Breaking</h4>

User agents may activate
language-specific content analysis
described in ''word-break/auto-phrase''
in response to user preferences.
User agents with this behavior must do this
by setting the [=declared value=] of 'word-break' to ''word-break/auto-phrase''
in the [=User Origin=].

Note: This means that web content can detect whether or not this feature is
enabled by calling getComputedStyle(), even if the user agent enabled this
feature by default.

<h3 id="line-break-property">
Line Breaking Strictness: the 'line-break' property</h3>

Expand Down