[css-text-3] Undefine segment break transformation rules in Level 3. #5086

fantasai · fantasai · commit b3bb0ed18b31 · 2020-11-17T21:03:01.000-08:00
diff --git a/css-text-3/Overview.bs b/css-text-3/Overview.bs
@@ -2048,13 +2048,18 @@ Order of Operations</h4>
         white-space-processing-018.xht
       </wpt>
 
-      <p>For other values of 'white-space', <a>segment breaks</a> are <a>collapsible</a>.
-        Any collapsible <a>segment break</a> immediately following another collapsible <a>segment break</a>
+      <p>For other values of 'white-space', <a>segment breaks</a> are <a>collapsible</a>,
+      and are collapsed as follows:
+
+      <ol>
+        <li>First, any collapsible <a>segment break</a> immediately following another collapsible <a>segment break</a>
         is removed.
-        Then any remaining <a>segment break</a> is
+        <li>Then any remaining <a>segment break</a> is
         either transformed into a space (U+0020) or removed
-        depending on the context before and after the break:
+        depending on the context before and after the break.
+        The rules for this operation are UA-defined in this level.
 
+<!-- CUT SEGMENT BREAK TRANSFORM
         <wpt pathprefix="/css/vendor-imports/mozilla/mozilla-central-reftests/text3/">
           segment-break-transformation-removable-2.html
           segment-break-transformation-removable-4.html
@@ -2082,7 +2087,7 @@ Order of Operations</h4>
         <li>Otherwise, if both the characters before and after the [=segment break=]
           belong to the [=space-discarding character set=] (see [[#space-discard-set]]),
           then the [=segment break=] is removed.
-<!--
+
         <li>Otherwise, if the <a>East Asian Width property</a> [[!UAX11]] of both
           the character before and after the [=segment break=] is
           <code>Fullwidth</code>, <code>Wide</code>, or <code>Halfwidth</code>
@@ -2170,7 +2175,6 @@ Order of Operations</h4>
           <wpt>
             writing-system/writing-system-segment-break-001.html
           </wpt>
--->
         <li>Otherwise, the [=segment break=] is converted to a space (U+0020).
 
           <wpt>
@@ -2183,18 +2187,25 @@ Order of Operations</h4>
           </wpt>
 
       </ul>
-<!--
       <p>
         For this purpose,
         Emoji (Unicode property <code>Emoji</code>)
         with an <a>East Asian Width property</a> of
         <code>Wide</code> or <code>Neutral</code>
         are treated as having an <a>East Asian Width property</a> of
         <code>Ambiguous</code>.
--->
-      Note: The white space processing rules have already
+
+
+  ISSUE(5086): Should space-discarding punctuation have a stronger influence over mismatched before/after contexts?
+
+  ISSUE(5017): Should we classify punctuation and/or symbols as a category of space-ambiguous characters? (Currently spaces are discarded only if both sides are space-discarding; ambiguous characters would defer to the other side.)
+
+CUT SEGMENT BREAK TRANSFORM -->
+
+        Note: The white space processing rules have already
         removed any [=tabs=] and [=spaces=] around the [=segment break=]
-        before these checks take place.
+        before this context is evaluated.
+      </ol>
 
       <div class="example">
         The purpose of the segment break transformation rules
@@ -2210,9 +2221,10 @@ Order of Operations</h4>
             Here is an English paragraph
             that is broken into multiple lines
             in the source code so that it can
-            more easily read in a text editor.
+            be more easily read and edited
+            in a text editor.
           </pre>
-          <p>Here is an English paragraph that is broken into multiple lines in the source code so that it can be more easily read in a text editor.</p>
+          <p>Here is an English paragraph that is broken into multiple lines in the source code so that it can be more easily read and edited in a text editor.</p>
           <figcaption>
             Eliminating a line break in English requires maintaining a [=space=] in its place.
           </figcaption>
@@ -2233,21 +2245,16 @@ Order of Operations</h4>
           </figcaption>
         </figure>
 
-        The segment break transformation rules thus use adjacent context
+        The segment break transformation rules can use adjacent context
         to either transform the segment break into a space
         or eliminate it entirely.
       </div>
 
-      <p class="feedback issue">Comments on how well these rules would work in practice would
-        be very much appreciated, particularly from people who work with
-        Thai and similar scripts.
-        Note that browser implementations do not currently follow these rules consistently
-        (although IE does in some cases transform the break,
-        and Firefox follows the first two bullet points).</p>
-
-  ISSUE(5086): Should space-discarding punctuation have a stronger influence over mismatched before/after contexts?
-
-  ISSUE(5017): Should we classify punctuation and/or symbols as a category of space-ambiguous characters? (Currently spaces are discarded only if both sides are space-discarding; ambiguous characters would defer to the other side.)
+      Note: Historically, HTML and CSS have unconditionally converted [=segment breaks=] to spaces,
+      which has prevented content authored in languages such as Chinese
+      from being able to break lines within the source.
+      Thus UA heurstics need to be conservative about where they discard [=segment breaks=]
+      even as they strive to improve support for such languages.
 
   <h3 id="tab-size-property" caniuse="css3-tabsize" oldids="tab-size">
     Tab Character Size: the 'tab-size' property</h3>
@@ -5921,6 +5928,7 @@ Characters and Properties</h2>
     but take their other properties from the first combining character in the sequence.
   </ul>
 
+<!-- CUT SEGMENT BREAK TRANSFORM
 <h2 id="space-discard-set" class="no-num">Appendix F.
 Space-Discarding Unicode Characters</h2>
 
@@ -6069,15 +6077,15 @@ Space-Discarding Unicode Characters</h2>
     the Unicode Consortium will recognize the need for an “unbreaking” algorithm
     and take over maintenance of such.
 
-    <!-- things that could use an unbreaking algorithm:
+    things that could use an unbreaking algorithm:
       * HTML/CSS
       * Markdown
       * TeX
       * text editors' “unbreak lines” commands
-    -->
   </details>
+CUT SEGMENT BREAK TRANSFORM -->
 
-<h2 id="script-tagging" class="no-num">Appendix G.
+<h2 id="script-tagging" class="no-num">Appendix F.
 Identifying the Content Writing System</h2>
 
 	<p><em>This appendix is normative.</em></p>
@@ -6187,7 +6195,7 @@ Identifying the Content Writing System</h2>
 		Note: Mere omission of the [=writing system=] information when the [=content language=] is declared
 		means the that the [=writing system=] is implied, not unknown.
 
-<h2 id="small-kana" class=no-num>Appendix H.
+<h2 id="small-kana" class=no-num>Appendix G.
 Small Kana Mappings</h2>
 <style>
 .pairs-table th {