Skip to content

Commit b3bb0ed

Browse files
committed
[css-text-3] Undefine segment break transformation rules in Level 3. #5086
1 parent d3e4c4a commit b3bb0ed

File tree

1 file changed

+35
-27
lines changed

1 file changed

+35
-27
lines changed

css-text-3/Overview.bs

+35-27
Original file line numberDiff line numberDiff line change
@@ -2048,13 +2048,18 @@ Order of Operations</h4>
20482048
white-space-processing-018.xht
20492049
</wpt>
20502050

2051-
<p>For other values of 'white-space', <a>segment breaks</a> are <a>collapsible</a>.
2052-
Any collapsible <a>segment break</a> immediately following another collapsible <a>segment break</a>
2051+
<p>For other values of 'white-space', <a>segment breaks</a> are <a>collapsible</a>,
2052+
and are collapsed as follows:
2053+
2054+
<ol>
2055+
<li>First, any collapsible <a>segment break</a> immediately following another collapsible <a>segment break</a>
20532056
is removed.
2054-
Then any remaining <a>segment break</a> is
2057+
<li>Then any remaining <a>segment break</a> is
20552058
either transformed into a space (U+0020) or removed
2056-
depending on the context before and after the break:
2059+
depending on the context before and after the break.
2060+
The rules for this operation are UA-defined in this level.
20572061

2062+
<!-- CUT SEGMENT BREAK TRANSFORM
20582063
<wpt pathprefix="/css/vendor-imports/mozilla/mozilla-central-reftests/text3/">
20592064
segment-break-transformation-removable-2.html
20602065
segment-break-transformation-removable-4.html
@@ -2082,7 +2087,7 @@ Order of Operations</h4>
20822087
<li>Otherwise, if both the characters before and after the [=segment break=]
20832088
belong to the [=space-discarding character set=] (see [[#space-discard-set]]),
20842089
then the [=segment break=] is removed.
2085-
<!--
2090+
20862091
<li>Otherwise, if the <a>East Asian Width property</a> [[!UAX11]] of both
20872092
the character before and after the [=segment break=] is
20882093
<code>Fullwidth</code>, <code>Wide</code>, or <code>Halfwidth</code>
@@ -2170,7 +2175,6 @@ Order of Operations</h4>
21702175
<wpt>
21712176
writing-system/writing-system-segment-break-001.html
21722177
</wpt>
2173-
-->
21742178
<li>Otherwise, the [=segment break=] is converted to a space (U+0020).
21752179

21762180
<wpt>
@@ -2183,18 +2187,25 @@ Order of Operations</h4>
21832187
</wpt>
21842188

21852189
</ul>
2186-
<!--
21872190
<p>
21882191
For this purpose,
21892192
Emoji (Unicode property <code>Emoji</code>)
21902193
with an <a>East Asian Width property</a> of
21912194
<code>Wide</code> or <code>Neutral</code>
21922195
are treated as having an <a>East Asian Width property</a> of
21932196
<code>Ambiguous</code>.
2194-
-->
2195-
Note: The white space processing rules have already
2197+
2198+
2199+
ISSUE(5086): Should space-discarding punctuation have a stronger influence over mismatched before/after contexts?
2200+
2201+
ISSUE(5017): Should we classify punctuation and/or symbols as a category of space-ambiguous characters? (Currently spaces are discarded only if both sides are space-discarding; ambiguous characters would defer to the other side.)
2202+
2203+
CUT SEGMENT BREAK TRANSFORM -->
2204+
2205+
Note: The white space processing rules have already
21962206
removed any [=tabs=] and [=spaces=] around the [=segment break=]
2197-
before these checks take place.
2207+
before this context is evaluated.
2208+
</ol>
21982209

21992210
<div class="example">
22002211
The purpose of the segment break transformation rules
@@ -2210,9 +2221,10 @@ Order of Operations</h4>
22102221
Here is an English paragraph
22112222
that is broken into multiple lines
22122223
in the source code so that it can
2213-
more easily read in a text editor.
2224+
be more easily read and edited
2225+
in a text editor.
22142226
</pre>
2215-
<p>Here is an English paragraph that is broken into multiple lines in the source code so that it can be more easily read in a text editor.</p>
2227+
<p>Here is an English paragraph that is broken into multiple lines in the source code so that it can be more easily read and edited in a text editor.</p>
22162228
<figcaption>
22172229
Eliminating a line break in English requires maintaining a [=space=] in its place.
22182230
</figcaption>
@@ -2233,21 +2245,16 @@ Order of Operations</h4>
22332245
</figcaption>
22342246
</figure>
22352247

2236-
The segment break transformation rules thus use adjacent context
2248+
The segment break transformation rules can use adjacent context
22372249
to either transform the segment break into a space
22382250
or eliminate it entirely.
22392251
</div>
22402252

2241-
<p class="feedback issue">Comments on how well these rules would work in practice would
2242-
be very much appreciated, particularly from people who work with
2243-
Thai and similar scripts.
2244-
Note that browser implementations do not currently follow these rules consistently
2245-
(although IE does in some cases transform the break,
2246-
and Firefox follows the first two bullet points).</p>
2247-
2248-
ISSUE(5086): Should space-discarding punctuation have a stronger influence over mismatched before/after contexts?
2249-
2250-
ISSUE(5017): Should we classify punctuation and/or symbols as a category of space-ambiguous characters? (Currently spaces are discarded only if both sides are space-discarding; ambiguous characters would defer to the other side.)
2253+
Note: Historically, HTML and CSS have unconditionally converted [=segment breaks=] to spaces,
2254+
which has prevented content authored in languages such as Chinese
2255+
from being able to break lines within the source.
2256+
Thus UA heurstics need to be conservative about where they discard [=segment breaks=]
2257+
even as they strive to improve support for such languages.
22512258

22522259
<h3 id="tab-size-property" caniuse="css3-tabsize" oldids="tab-size">
22532260
Tab Character Size: the 'tab-size' property</h3>
@@ -5921,6 +5928,7 @@ Characters and Properties</h2>
59215928
but take their other properties from the first combining character in the sequence.
59225929
</ul>
59235930

5931+
<!-- CUT SEGMENT BREAK TRANSFORM
59245932
<h2 id="space-discard-set" class="no-num">Appendix F.
59255933
Space-Discarding Unicode Characters</h2>
59265934

@@ -6069,15 +6077,15 @@ Space-Discarding Unicode Characters</h2>
60696077
the Unicode Consortium will recognize the need for an “unbreaking” algorithm
60706078
and take over maintenance of such.
60716079

6072-
<!-- things that could use an unbreaking algorithm:
6080+
things that could use an unbreaking algorithm:
60736081
* HTML/CSS
60746082
* Markdown
60756083
* TeX
60766084
* text editors' “unbreak lines” commands
6077-
-->
60786085
</details>
6086+
CUT SEGMENT BREAK TRANSFORM -->
60796087

6080-
<h2 id="script-tagging" class="no-num">Appendix G.
6088+
<h2 id="script-tagging" class="no-num">Appendix F.
60816089
Identifying the Content Writing System</h2>
60826090

60836091
<p><em>This appendix is normative.</em></p>
@@ -6187,7 +6195,7 @@ Identifying the Content Writing System</h2>
61876195
Note: Mere omission of the [=writing system=] information when the [=content language=] is declared
61886196
means the that the [=writing system=] is implied, not unknown.
61896197

6190-
<h2 id="small-kana" class=no-num>Appendix H.
6198+
<h2 id="small-kana" class=no-num>Appendix G.
61916199
Small Kana Mappings</h2>
61926200
<style>
61936201
.pairs-table th {

0 commit comments

Comments
 (0)