Skip to content

Commit a7b7029

Browse files
fantasaiJordan Taylor
authored and
Jordan Taylor
committed
[css-text-3] Initial draft of Unicode Block -based segment break transformation. w3c#337
1 parent b5d604c commit a7b7029

File tree

1 file changed

+147
-4
lines changed

1 file changed

+147
-4
lines changed

css-text-3/Overview.bs

+147-4
Original file line numberDiff line numberDiff line change
@@ -1825,6 +1825,11 @@ Text Processing</h3>
18251825
segment-break-transformation-removable-1.html
18261826
segment-break-transformation-removable-3.html
18271827
</wpt>
1828+
1829+
<li>Otherwise, if both the characters before and after the [=segment break=]
1830+
belong to the [=space-discarding character set=] (see [[#space-discard-set]]),
1831+
then the [=segment break=] is removed.
1832+
<!--
18281833
<li>Otherwise, if the <a>East Asian Width property</a> [[!UAX11]] of both
18291834
the character before and after the [=segment break=] is
18301835
<code>Fullwidth</code>, <code>Wide</code>, or <code>Halfwidth</code>
@@ -1912,17 +1917,18 @@ Text Processing</h3>
19121917
<wpt>
19131918
writing-system/writing-system-segment-break-001.html
19141919
</wpt>
1920+
-->
19151921
<li>Otherwise, the [=segment break=] is converted to a space (U+0020).
19161922
</ul>
1917-
1923+
<!--
19181924
<p>
19191925
For this purpose,
19201926
Emoji (Unicode property <code>Emoji</code>)
19211927
with an <a>East Asian Width property</a> of
19221928
<code>Wide</code> or <code>Neutral</code>
19231929
are treated as having an <a>East Asian Width property</a> of
19241930
<code>Ambiguous</code>.
1925-
1931+
-->
19261932
<p class="note">Note: The white space processing rules have already
19271933
removed any [=tabs=] and [=spaces=] after the [=segment break=] before these checks
19281934
take place.</p>
@@ -5244,7 +5250,144 @@ Characters and Properties</h2>
52445250
but take their other properties from the first combining character in the sequence.
52455251
</ul>
52465252

5247-
<h2 id="script-tagging" class="no-num">Appendix F.
5253+
<h2 id="space-discard-set" class="no-num">Appendix F.
5254+
Space-Discarding Unicode Characters</h2>
5255+
5256+
<p><em>This appendix is normative.</em></p>
5257+
5258+
Characters from the following blocks in Unicode 13.0 [[UNICODE]]
5259+
are considered part of the <dfn>space-discarding character set</dfn>
5260+
for the purpose of [[#line-break-transform]]:
5261+
5262+
<table class=data>
5263+
<caption>Space-discarding Unicode Bocks</caption>
5264+
<thead>
5265+
<tr>
5266+
<th>Codepoint Range
5267+
<th>Block Name
5268+
<tbody>
5269+
<tr>
5270+
<td>U+2E80..U+2EFF
5271+
<td>CJK Radicals Supplement
5272+
<tr>
5273+
<td>U+2F00..U+2FDF
5274+
<td>Kangxi Radicals
5275+
<tr>
5276+
<td>U+2FF0..U+2FFF
5277+
<td>Ideographic Description Characters
5278+
<tr>
5279+
<td>U+3000..U+303F
5280+
<td>CJK Symbols and Punctuation
5281+
<tr>
5282+
<td>U+3040..U+309F
5283+
<td>Hiragana
5284+
<tr>
5285+
<td>U+30A0..U+30FF
5286+
<td>Katakana
5287+
<tr>
5288+
<td>U+3130..U+318F
5289+
<td>Kanbun
5290+
<tr>
5291+
<td>U+3100..U+312F
5292+
<td>Bopomofo Extended
5293+
<tr>
5294+
<td>U+3190..U+319F
5295+
<td>Kanbun
5296+
<tr>
5297+
<td>U+31C0..U+31EF
5298+
<td>CJK Strokes
5299+
<tr>
5300+
<td>U+31F0..U+31FF
5301+
<td>Katakana Phonetic Extensions
5302+
<tr>
5303+
<td>U+3200..U+32FF
5304+
<td>Enclosed CJK Letters and Months
5305+
<tr>
5306+
<td>U+3300..U+33FF
5307+
<td>CJK Compatibility
5308+
<tr>
5309+
<td>U+3400..U+4DBF
5310+
<td>CJK Unified Ideographs Extension A
5311+
<tr>
5312+
<td>U+4DC0..U+4DFF
5313+
<td>Yijing Hexagram Symbols
5314+
<tr>
5315+
<td>U+4E00..U+9FFF
5316+
<td>CJK Unified Ideographs
5317+
<tr>
5318+
<td>U+A000..U+A48F
5319+
<td>Yi Syllables
5320+
<tr>
5321+
<td>U+A490..U+A4CF
5322+
<td>Yi Radicals
5323+
<tr>
5324+
<td>U+F900..U+FAFF
5325+
<td>CJK Compatibility Ideographs
5326+
<tr>
5327+
<td>U+FE10..U+FE1F
5328+
<td>Vertical Forms
5329+
<tr>
5330+
<td>U+FE30..U+FE4F
5331+
<td>CJK Compatibility Forms
5332+
<tr>
5333+
<td>U+FE50..U+FE6F
5334+
<td>Small Form Variants
5335+
<tr>
5336+
<td>U+FF00..U+FFEF
5337+
<td>Halfwidth and Fullwidth Forms
5338+
<tr>
5339+
<td>U+1B000..U+1B0FF
5340+
<td>Kana Supplement
5341+
<tr>
5342+
<td>U+1B100..U+1B12F
5343+
<td>Kana Extended-A
5344+
<tr>
5345+
<td>U+1B130..U+1B16F
5346+
<td>Small Kana Extension
5347+
<tr>
5348+
<td>U+1D300..U+1D35F
5349+
<td>Tai Xuan Jing Symbols
5350+
<tr>
5351+
<td>U+1D360..U+1D37F
5352+
<td>Counting Rod Numerals
5353+
<tr>
5354+
<td>U+1F200..U+1F2FF
5355+
<td>Enclosed Ideographic Supplement
5356+
<tr>
5357+
<td>U+20000..U+2A6DF
5358+
<td>CJK Unified Ideographs Extension B
5359+
<tr>
5360+
<td>U+2A700..U+2B73F
5361+
<td>CJK Unified Ideographs Extension C
5362+
<tr>
5363+
<td>U+2B740..U+2B81F
5364+
<td>CJK Unified Ideographs Extension D
5365+
<tr>
5366+
<td>U+2B820..U+2CEAF
5367+
<td>CJK Unified Ideographs Extension E
5368+
<tr>
5369+
<td>U+2CEB0..U+2EBEF
5370+
<td>CJK Unified Ideographs Extension F
5371+
<tr>
5372+
<td>U+2F800..U+2FA1F
5373+
<td>CJK Compatibility Ideographs Supplement
5374+
<tr>
5375+
<td>U+30000..U+3134F
5376+
<td>CJK Unified Ideographs Extension G
5377+
</table>
5378+
5379+
ISSUE: Do we include Bopomofo?
5380+
5381+
ISSUE: Do we include enclosed ideographs?
5382+
5383+
ISSUE: Do we include symbol sets like Yijing Hexagrams / Counting Rods / etc.?
5384+
5385+
For future revisions of [[UNICODE]],
5386+
any new block whose contents comprise at least 50% codepoints belonging to the
5387+
Han, Hiragana, Katakana, or Yi script
5388+
shall also be considered part of the [=space-discarding character set=].
5389+
5390+
<h2 id="script-tagging" class="no-num">Appendix G.
52485391
Tagging Content by Writing System</h2>
52495392

52505393
<p><em>This appendix is normative.</em></p>
@@ -5339,7 +5482,7 @@ Tagging Content by Writing System</h2>
53395482
<a href="https://www.w3.org/International/articles/language-tags/">“Language tags in HTML and XML”</a>
53405483
and <a href="https://www.w3.org/International/questions/qa-choosing-language-tags">“Choosing a Language Tag”</a>.
53415484

5342-
<h2 id="small-kana" class=no-num>Appendix G.
5485+
<h2 id="small-kana" class=no-num>Appendix H.
53435486
Small Kana Mappings</h2>
53445487
<style>
53455488
.pairs-table th {

0 commit comments

Comments
 (0)