@@ -23,6 +23,8 @@ spec: css-text-3; type: property
2323 text: word-spacing
2424spec: css-text-3; type: dfn
2525 text: forced line break
26+ text: word-separator character
27+ text: other space separator
2628</pre>
2729
2830<pre class=biblio>
@@ -92,74 +94,71 @@ Detecting Word Boundaries: the 'word-boundary-detection' property</h4>
9294 Animation type : discrete
9395 </pre>
9496
97+ This property allows the author to decide
98+ whether and how
99+ the User Agent must analyse the content
100+ to determine where word boundaries are,
101+ and to insert [=virtual word boundaries=] accordingly.
102+
103+ A <dfn data-dfn-for='' data-dfn-type=dfn>virtual word boundary</dfn> is similar to the presence
104+ of the ZERO WIDTH SPACE (U+200B) character:
105+ it introduces a [=soft wrap opportunity=]
106+ and is affected by the 'word-boundary-expansion' property;
107+ its presence has no effect on text shaping,
108+ nor on 'word-spacing' .
109+ However, its insertion must have no effect on the underlying content,
110+ and must not affect the content of a plain text copy & paste operation.
111+
95112 <dl dfn-type=value dfn-for=word-boundary-detection>
96113 <dt> <dfn>manual</dfn>
97114 <dd>
98- This property has no effect .
115+ The User Agent must not insert [=virtual word boundaries=] .
99116
100117 <dt> <dfn>auto(<<lang>>)</dfn>
101118 <dd>
102119
103120 This value directs the User Agent to perform language-specific content analysis
104- to determine where word boundaries are.
105- The specific algorithm to be used is UA-dependent.
106- However, inline element boundaries and out-of-flow elements must be ignored when determining word boundaries.
121+ to determine where to insert [=virtual word boundaries=] .
107122
108123 <dfn dfn-type=type><<lang>></dfn> must be a valid CSS <<ident>> or <<string>> .
109124 It represents an IETF BCP 47 language range
110- (See [[BCP47]] ).
111- If the UA does not support word-boundary detection for <em> all</em> languages represented by the specified range,
125+ (see [[BCP47]] ).
126+ If the UA does not support word-boundary detection
127+ for <em> all</em> languages represented by the specified range,
112128 it must reject that value at parse-time.
113129
114- <div class=example>
115- If a User Agent has a word-boundary detection system for Cantonese
116- that is not suitable for the broader set of Chinese languages,
117- it must accept ''lang(yue)'' , ''lang(zh-yue)'' , or ''lang(zh-HK)'' ,
118- but not ''lang(zh)'' or ''lang(zh-Hant)'' .
119-
120- However, if the User Agent supports a generic word-boundary detection system
121- that is suitable for Chinese in general,
122- it should accept the broad ''lang(zh)'' characterization,
123- as well as any more specific ones,
124- such as ''lang(zh-yue)'' , ''lang(zh-Hant-HK)'' , ''lang(zh-Hans-SG)'' , or ''lang(zh-hak).
125- </div>
130+ Note: Wildcards <em> in the language subtag</em> would imply
131+ support for detecting word boundaries in an undefined and effectively unlimited set of languages.
132+ As this this is not possible,
133+ wildcards in the language subtag always result in the declaration
134+ being treated as invalid.
126135
127136 Note: Whether a word boundary detection system designed for one language
128137 is suitable for some or all dialects of that language is somewhat subjective,
129- and this specifications leaves it at the appreciation of the User Agent.
138+ and this specifications leaves it at the discretion of the User Agent.
130139 Even if a detection system is not able to cope with all nuances of a particular dialect,
131140 it may be reasonable to claim support
132141 if the detection correctly recognizes word boundaries most of the time.
133142 However, the User Agent would do a disservice to authors and users
134143 if it claimed support for languages
135- where it fails to detect most word boundaries.
136-
137- Note: Wildcards <em>in the language subtag</em> would imply
138- support for detecting word boundaries in an undefined and effectively unlimited set of languages.
139- As this this is not possible,
140- wildcards in the language subtag must always be treated as invalid.
144+ where it fails to detect most word boundaries
145+ or has a high error rate.
141146
142147 If the element’s [=content language=] ,
143148 as represented in BCP 47 syntax [[BCP47]] ,
144- does <em>not</em> matches the language range described by the computed value's <<lang>>
149+ does <em> not</em> match the language range described by the computed value's <<lang>>
145150 in an extended filtering operation
146151 per [[RFC4647]] <cite> Matching of Language Tags</cite> (section 3.3.2),
147- then the [=used value=] is set to '' word-boundary-detection/manual'',
152+ then the [=used value=] is ''word-boundary-detection/manual'' ,
148153 and this property has no effect on this element.
149- <span class=note>(This is the same maching logic as the one used for the '' :lang()'' selector, negated.)</span>
150154 Otherwise,
151- the User Agent must insert the ZERO WIDTH SPACE (U+200B) character
155+ the User Agent must insert a [=virtual word boundary=]
152156 at each detected word boundary
153157 within the [=text run=] children of this element.
154- However, the UA must not insert U+200B:
155- * at the beginning or end of a [=block container=]
156- * at the beginning or end of an [=inline box=] whose parent box has a [=used value=] of '' word-boundary-detection/manual''
158+ Within the constraints set by this specification,
159+ the specific algorithm used is UA-dependent.
157160
158- The insertion happens before layout,
159- so all layout operations that depend on the characters in the content
160- (such as [[CSS-TEXT-3#white-space-rules]], [=line breaking=], or [=intrinsic sizing=])
161- must take the presence of that character into account.
162- [=Selectors=] are not affected.
161+ Note: This is the same matching logic as the one used for the '':lang()'' selector.
163162
164163 Issue: Should we allow, or require, Canonicalization of language tags and ranges,
165164 as per [[RFC5646]] section 4.5,
@@ -172,11 +171,24 @@ Detecting Word Boundaries: the 'word-boundary-detection' property</h4>
172171 for such mappings.
173172 </dl>
174173
175- Note: Specifying the language for which the word boundary detection is to be performed
176- is required in order to make this feature meaningfully testable with '' @supports''.
174+ <div class=example>
175+ If a User Agent has a word-boundary detection system for Cantonese
176+ that is not suitable for the broader set of Chinese languages,
177+ it is expected to accept ''auto(yue)'' , ''auto(zh-yue)'' , or ''auto(zh-HK)'' ,
178+ but not ''auto(zh)'' or ''auto(zh-Hant)'' .
179+
180+ However, if the User Agent supports a generic word-boundary detection system
181+ that is suitable for Chinese in general,
182+ it is expected to accept the broad ''auto(zh)'' characterization,
183+ as well as any more specific ones,
184+ such as ''auto(zh-yue)'' , ''auto(zh-Hant-HK)'' , ''auto(zh-Hans-SG)'' , or ''auto(zh-hak).
185+ </div>
177186
178187 <div class=example>
179- Japanese text normally allows line breaking between letters of a word
188+ Specifying the language for which the word boundary detection is to be performed
189+ is required in order to make this feature meaningfully testable with '' @supports''.
190+
191+ For example, Japanese text normally allows line breaking between letters of a word
180192 (see '' word-break: normal'').
181193 The following code disables that in <code>h1</code> elements,
182194 and only allows line breaking at autodetected word boundaries instead,
@@ -195,6 +207,66 @@ Detecting Word Boundaries: the 'word-boundary-detection' property</h4>
195207 </code></pre>
196208 </div>
197209
210+ [=Virtual word boundary=] insertion happens before [[CSS-TEXT-3#white-space-phase-1]]
211+ and before [[#word-boundary-expansion]].
212+ Later operations
213+ (including [[CSS-TEXT-3#white-space-rules]], [=line breaking=], and [=intrinsic sizing=])
214+ must take the presence of the [=virtual word boundary=] into account.
215+ [=Selectors=] are not affected.
216+
217+ Inline box boundaries
218+ and out-of-flow elements must be ignored
219+ when determining word boundaries.
220+
221+ If a word boundary is found at the same position as
222+ one or more inline box boundaries,
223+ the [=virtual word boundary=] must be inserted
224+ in the outermost element that participates in this inline box boundary.
225+
226+ <div class=example>
227+ In the following example,
228+ the red “<code><span style="color:red">|</span></code>” indicates
229+ reasonable positions for a User Agent to insert virtual word boundaries:
230+ <pre><code highlight=html>กรุงเทพ<span style="color:red">|</span>คือ<span style="color:red">|</span>สวยงาม</code></pre>
231+ If that sentence had contained some inline markup,
232+ the following example shows the correct position to insert the virtual word boundaries:
233+ <pre><code highlight=html>กรุงเทพ<span style="color:red">|</span>คือ<span style="color:red">|</span><em>สวยงาม</em></code></pre>
234+ The following example shows <em>incorrect</em> positions:
235+ <pre><code highlight=html>กรุงเทพ<span style="color:red">|</span>คือ<em><span style="color:red">|</span>สวยงาม</em></code></pre>
236+ The following shows the correct positions in a more contrieved situation:
237+ <pre><code highlight=html>กรุงเทพ<span style="color:red">|</span><b><u>คือ</u><span style="color:red">|</span><em>สวยงาม</em></b></code></pre>
238+ </div>
239+
240+ The User Agent may tailor its word boundary detection algorithm
241+ depending on whether 'line-break' is
242+ '' loose''/'' line-break/normal''/'' line-break/strict''.
243+
244+ The User Agent must not insert a [=virtual word boundary=]:
245+ <ul>
246+ <li>
247+ at the beginning or end of any box
248+ (including [=inline boxes=])
249+ whose parent box has a [=used value=]
250+ of '' word-boundary-detection/manual''.
251+
252+ <li>
253+ immediately adjacent to a [=word-separator character=],
254+ or an [=other space separator=],
255+ or a ZERO WIDTH SPACE (U+200B) character.
256+
257+ Note: This implies that for languages such as English
258+ where words are separated by spaces or other separating characters,
259+ '' word-boundary-detection/auto(<lang> )'' has no effect.
260+
261+ <li>
262+ between a [=typographic letter unit=]
263+ and a subsequent [=typographic character unit=] from the [[!UNICODE]] Pe or Pf classes,
264+ or between a [=typographic letter unit=]
265+ and a preceeding [=typographic character unit=] from the [[!UNICODE]] Ps or Pi classes,
266+ or between a [=typographic letter unit=]
267+ and an adjacent [=typographic character unit=] from the [[!UNICODE]] Pc or Pd or Po classes.
268+ </ul>
269+
198270<h4 id=word-boundary-expansion>
199271Makig Word Boundaries Visible: the 'word-boundary-expansion' property</h4>
200272
@@ -221,26 +293,33 @@ Makig Word Boundaries Visible: the 'word-boundary-expansion' property</h4>
221293 into other word-separating characters,
222294 to accomodate variant typesetting styles.
223295
224- <dl dfn-for="zero-width-space -expansion" dfn-type="value">
296+ <dl dfn-for="word-boundary -expansion" dfn-type="value">
225297 <dt><dfn>none</dfn>
226298 <dd>This property has no effect.
227299
228300 <dt><dfn>space</dfn>
229301 <dd>
230- All instances of U+200B ZERO WIDTH SPACE
302+ Instances of U+200B ZERO WIDTH SPACE
231303 within the [=text run=] children of this element
232304 are replaced by U+0020 SPACE.
233305
234306 <dt><dfn>ideographic-space</dfn>
235307 <dd>
236- All instances of U+200B ZERO WIDTH SPACE
308+ Instances of U+200B ZERO WIDTH SPACE
237309 within the [=text run=] children of this element
238310 are replaced by U+3000 IDEOGRAPHIC SPACE.
239311 </dl>
240312
313+ The User Agent must not replace
314+ instances of U+200B imediately preceding or following
315+ a [=forced line break=]
316+ (ignoring any intervening inline box boundaries,
317+ and associated 'margin'/'border'/'padding').
318+
241319 Instances of <{wbr}> are considered equivalent to U+200B,
242320 and are also replaced,
243- as are U+200B inserted by '' word-boundary-detection: auto()''.
321+ as are [=virtual word boundaries=] inserted by 'word-boundary-detection'.
322+
244323 Unlike 'text-transform',
245324 this substitution happens before [[CSS-TEXT-3#white-space-phase-1]]
246325 so that later operations that depend on the characters in the content
0 commit comments