@@ -1825,6 +1825,11 @@ Text Processing</h3>
1825
1825
segment-break-transformation-removable-1.html
1826
1826
segment-break-transformation-removable-3.html
1827
1827
</wpt>
1828
+
1829
+ <li> Otherwise, if both the characters before and after the [=segment break=]
1830
+ belong to the [=space-discarding character set=] (see [[#space-discard-set]] ),
1831
+ then the [=segment break=] is removed.
1832
+ <!--
1828
1833
<li> Otherwise, if the <a>East Asian Width property</a> [[!UAX11]] of both
1829
1834
the character before and after the [=segment break=] is
1830
1835
<code> Fullwidth</code> , <code> Wide</code> , or <code> Halfwidth</code>
@@ -1912,17 +1917,18 @@ Text Processing</h3>
1912
1917
<wpt>
1913
1918
writing-system/writing-system-segment-break-001.html
1914
1919
</wpt>
1920
+ -->
1915
1921
<li> Otherwise, the [=segment break=] is converted to a space (U+0020).
1916
1922
</ul>
1917
-
1923
+ <!--
1918
1924
<p>
1919
1925
For this purpose,
1920
1926
Emoji (Unicode property <code> Emoji</code> )
1921
1927
with an <a>East Asian Width property</a> of
1922
1928
<code> Wide</code> or <code> Neutral</code>
1923
1929
are treated as having an <a>East Asian Width property</a> of
1924
1930
<code> Ambiguous</code> .
1925
-
1931
+ -->
1926
1932
<p class="note"> Note: The white space processing rules have already
1927
1933
removed any [=tabs=] and [=spaces=] after the [=segment break=] before these checks
1928
1934
take place.</p>
@@ -5244,7 +5250,144 @@ Characters and Properties</h2>
5244
5250
but take their other properties from the first combining character in the sequence.
5245
5251
</ul>
5246
5252
5247
- <h2 id="script-tagging" class="no-num">Appendix F.
5253
+ <h2 id="space-discard-set" class="no-num">Appendix F.
5254
+ Space-Discarding Unicode Characters</h2>
5255
+
5256
+ <p><em> This appendix is normative.</em></p>
5257
+
5258
+ Characters from the following blocks in Unicode 13.0 [[UNICODE]]
5259
+ are considered part of the <dfn>space-discarding character set</dfn>
5260
+ for the purpose of [[#line-break-transform]] :
5261
+
5262
+ <table class=data>
5263
+ <caption> Space-discarding Unicode Bocks</caption>
5264
+ <thead>
5265
+ <tr>
5266
+ <th> Codepoint Range
5267
+ <th> Block Name
5268
+ <tbody>
5269
+ <tr>
5270
+ <td> U+2E80..U+2EFF
5271
+ <td> CJK Radicals Supplement
5272
+ <tr>
5273
+ <td> U+2F00..U+2FDF
5274
+ <td> Kangxi Radicals
5275
+ <tr>
5276
+ <td> U+2FF0..U+2FFF
5277
+ <td> Ideographic Description Characters
5278
+ <tr>
5279
+ <td> U+3000..U+303F
5280
+ <td> CJK Symbols and Punctuation
5281
+ <tr>
5282
+ <td> U+3040..U+309F
5283
+ <td> Hiragana
5284
+ <tr>
5285
+ <td> U+30A0..U+30FF
5286
+ <td> Katakana
5287
+ <tr>
5288
+ <td> U+3130..U+318F
5289
+ <td> Kanbun
5290
+ <tr>
5291
+ <td> U+3100..U+312F
5292
+ <td> Bopomofo Extended
5293
+ <tr>
5294
+ <td> U+3190..U+319F
5295
+ <td> Kanbun
5296
+ <tr>
5297
+ <td> U+31C0..U+31EF
5298
+ <td> CJK Strokes
5299
+ <tr>
5300
+ <td> U+31F0..U+31FF
5301
+ <td> Katakana Phonetic Extensions
5302
+ <tr>
5303
+ <td> U+3200..U+32FF
5304
+ <td> Enclosed CJK Letters and Months
5305
+ <tr>
5306
+ <td> U+3300..U+33FF
5307
+ <td> CJK Compatibility
5308
+ <tr>
5309
+ <td> U+3400..U+4DBF
5310
+ <td> CJK Unified Ideographs Extension A
5311
+ <tr>
5312
+ <td> U+4DC0..U+4DFF
5313
+ <td> Yijing Hexagram Symbols
5314
+ <tr>
5315
+ <td> U+4E00..U+9FFF
5316
+ <td> CJK Unified Ideographs
5317
+ <tr>
5318
+ <td> U+A000..U+A48F
5319
+ <td> Yi Syllables
5320
+ <tr>
5321
+ <td> U+A490..U+A4CF
5322
+ <td> Yi Radicals
5323
+ <tr>
5324
+ <td> U+F900..U+FAFF
5325
+ <td> CJK Compatibility Ideographs
5326
+ <tr>
5327
+ <td> U+FE10..U+FE1F
5328
+ <td> Vertical Forms
5329
+ <tr>
5330
+ <td> U+FE30..U+FE4F
5331
+ <td> CJK Compatibility Forms
5332
+ <tr>
5333
+ <td> U+FE50..U+FE6F
5334
+ <td> Small Form Variants
5335
+ <tr>
5336
+ <td> U+FF00..U+FFEF
5337
+ <td> Halfwidth and Fullwidth Forms
5338
+ <tr>
5339
+ <td> U+1B000..U+1B0FF
5340
+ <td> Kana Supplement
5341
+ <tr>
5342
+ <td> U+1B100..U+1B12F
5343
+ <td> Kana Extended-A
5344
+ <tr>
5345
+ <td> U+1B130..U+1B16F
5346
+ <td> Small Kana Extension
5347
+ <tr>
5348
+ <td> U+1D300..U+1D35F
5349
+ <td> Tai Xuan Jing Symbols
5350
+ <tr>
5351
+ <td> U+1D360..U+1D37F
5352
+ <td> Counting Rod Numerals
5353
+ <tr>
5354
+ <td> U+1F200..U+1F2FF
5355
+ <td> Enclosed Ideographic Supplement
5356
+ <tr>
5357
+ <td> U+20000..U+2A6DF
5358
+ <td> CJK Unified Ideographs Extension B
5359
+ <tr>
5360
+ <td> U+2A700..U+2B73F
5361
+ <td> CJK Unified Ideographs Extension C
5362
+ <tr>
5363
+ <td> U+2B740..U+2B81F
5364
+ <td> CJK Unified Ideographs Extension D
5365
+ <tr>
5366
+ <td> U+2B820..U+2CEAF
5367
+ <td> CJK Unified Ideographs Extension E
5368
+ <tr>
5369
+ <td> U+2CEB0..U+2EBEF
5370
+ <td> CJK Unified Ideographs Extension F
5371
+ <tr>
5372
+ <td> U+2F800..U+2FA1F
5373
+ <td> CJK Compatibility Ideographs Supplement
5374
+ <tr>
5375
+ <td> U+30000..U+3134F
5376
+ <td> CJK Unified Ideographs Extension G
5377
+ </table>
5378
+
5379
+ ISSUE: Do we include Bopomofo?
5380
+
5381
+ ISSUE: Do we include enclosed ideographs?
5382
+
5383
+ ISSUE: Do we include symbol sets like Yijing Hexagrams / Counting Rods / etc.?
5384
+
5385
+ For future revisions of [[UNICODE]] ,
5386
+ any new block whose contents comprise at least 50% codepoints belonging to the
5387
+ Han, Hiragana, Katakana, or Yi script
5388
+ shall also be considered part of the [=space-discarding character set=] .
5389
+
5390
+ <h2 id="script-tagging" class="no-num">Appendix G.
5248
5391
Tagging Content by Writing System</h2>
5249
5392
5250
5393
<p><em> This appendix is normative.</em></p>
@@ -5339,7 +5482,7 @@ Tagging Content by Writing System</h2>
5339
5482
<a href="https://www.w3.org/International/articles/language-tags/">“Language tags in HTML and XML”</a>
5340
5483
and <a href="https://www.w3.org/International/questions/qa-choosing-language-tags">“Choosing a Language Tag”</a> .
5341
5484
5342
- <h2 id="small-kana" class=no-num>Appendix G .
5485
+ <h2 id="small-kana" class=no-num>Appendix H .
5343
5486
Small Kana Mappings</h2>
5344
5487
<style>
5345
5488
.pairs-table th {
0 commit comments