Skip to content

Commit 1449eb8

Browse files
committed
[css-text] Make control characters visible. <http://lists.w3.org/Archives/Public/www-style/2014Oct/0259.html> Also clean up some Unicode property linking.
--HG-- extra : rebase_source : 8b8831e64c46a13adba42d21ef846646f34f9343
1 parent 2945eb6 commit 1449eb8

2 files changed

Lines changed: 135 additions & 110 deletions

File tree

css-text/Overview.bs

Lines changed: 38 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -269,7 +269,7 @@ Languages and Typesetting</h4>
269269
In CSS, language-specific typographic tailorings
270270
are only applied when the content language is known (declared).
271271

272-
<strong class="advisement">Authors should tag their content accurately for the best typographic behavior.</strong>
272+
<strong class="advisement">Authors should language-tag their content accurately for the best typographic behavior.</strong>
273273

274274
<h2 id="transforming">
275275
Transforming Text</h2>
@@ -591,11 +591,17 @@ Languages and Typesetting</h4>
591591
See [[CSS21]] section
592592
<a href="http://www.w3.org/TR/CSS21/visuren.html#anonymous">9.2.2.1</a></p>
593593

594-
<p>Control characters (Unicode class Cc) other than tab (U+0009), line feed
595-
(U+000A), and carriage return (U+000D)
596-
are ignored for the purpose of rendering.
597-
(As required by [[!UNICODE]],
598-
unsupported Default_ignorable characters must also be ignored for rendering.)
594+
<p>
595+
Control characters (<i>Unicode category</i> <code>Cc</code>)
596+
other than tab (U+0009), line feed (U+000A), and carriage return (U+000D)
597+
must be rendered as a visible glyph
598+
and otherwise treated as any other character of the Symbols (<code>S</code>) <i>general category</i> and Common <i title="Unicode script">script</i>.
599+
The UA may use a glyph provided by a font specifically for the control character,
600+
substitute the glyphs provided for the corresponding symbol in the Control Pictures block,
601+
generate a visual representation of its codepoint value,
602+
or use some other method to provide an appropriate visible glyph.
603+
As required by [[!UNICODE]],
604+
unsupported <code>Default_ignorable</code> characters must be ignored for rendering.
599605

600606
<h3 id="white-space-rules">
601607
The White Space Processing Rules</h3>
@@ -697,8 +703,8 @@ Languages and Typesetting</h4>
697703
<li>If the character immediately before or immediately after the segment
698704
break is the zero-width space character (U+200B), then the break
699705
is removed, leaving behind the zero-width space.
700-
<li>Otherwise, if the East Asian Width property [[!UAX11]] of both
701-
the character before and after the line feed is F, W, or H (not A),
706+
<li>Otherwise, if the <i>East Asian Width property</i> [[!UAX11]] of both
707+
the character before and after the line feed is <code>F</code>, <code>W</code>, or <code>H</code> (not <code>A</code>),
702708
and neither side is Hangul, then the segment break is removed.
703709
<li>Otherwise, the segment break is converted to a space (U+0020).
704710
</ul>
@@ -1121,10 +1127,10 @@ Line Breaking Details</h3>
11211127
&#xFF01;&nbsp;U+FF01, &#xFF1F;&nbsp;U+FF1F
11221128
<li>breaks before suffixes:<br>
11231129
Characters with the Unicode Line Break property <code>PO</code>
1124-
and the East Asian Width property [[!UAX11]] <code>A</code>, <code>F</code>, or <code>W</code>.
1130+
and the <i>East Asian Width property</i> [[!UAX11]] <code>A</code>, <code>F</code>, or <code>W</code>.
11251131
<li>breaks after prefixes:<br>
11261132
Characters with the Unicode Line Break property <code>PR</code>
1127-
and the East Asian Width property [[!UAX11]] <code>A</code>, <code>F</code>, or <code>W</code>.
1133+
and the <i>East Asian Width property</i> [[!UAX11]] <code>A</code>, <code>F</code>, or <code>W</code>.
11281134
</ul>
11291135
</ul>
11301136

@@ -2580,22 +2586,25 @@ Appendix C: Scripts and Spacing</h2>
25802586
<p><em>This appendix is normative.</em></p>
25812587

25822588
<p>Typographic behavior varies somewhat by language, but varies drastically by writing system.
2583-
This appendix categorizes some common scripts in Unicode 6.0 according to their justification and spacing behavior.
2589+
This appendix categorizes some common <i title="Unicode script">scripts</i> in Unicode 6.0
2590+
according to their justification and spacing behavior.
25842591
Category descriptions are descriptive, not prescriptive;
25852592
the determining factor is the prioritization of <i>justification opportunities</i>.
25862593

25872594
<dl>
25882595
<dt><dfn>block scripts</dfn></dt>
25892596
<dd>CJK and by extension all Wide characters (see [[!UAX11]].)
2590-
The following scripts are included: Bopomofo, Han, Hangul, Hiragana, Katakana, and Yi.
2591-
Characters of the East Asian Width property [[!UAX11]] W and F are also included,
2592-
but A are included only if the <i>content language</i> is Chinese, Korean, or Japanese.
2597+
The following <i>Unicode scripts</i> are included:
2598+
Bopomofo, Han, Hangul, Hiragana, Katakana, and Yi.
2599+
Characters of the <i>East Asian Width property</i> <code>W</code> and <code>F</code> are also included,
2600+
but <code>A</code> characters are included only if the <i>content language</i> is Chinese, Korean, or Japanese.
25932601
<dt><dfn>clustered scripts</dfn></dt>
2594-
<dd>Scripts that have discrete units,
2595-
break only at word boundaries,
2602+
<dd>Clustered scripts have discrete units
2603+
and break only at word boundaries,
25962604
but do not use visible word separators.
2597-
They comfortably admit inter-character spacing for justification.
2598-
The following scripts are included:
2605+
They prioritize stretching spaces,
2606+
but comfortably admit inter-character spacing for justification.
2607+
The clustered scripts include, but are not limited to, the following <i>Unicode scripts</i>:
25992608
Khmer,
26002609
Lao,
26012610
Myanmar,
@@ -2605,8 +2614,8 @@ Appendix C: Scripts and Spacing</h2>
26052614
Tai Viet,
26062615
Thai
26072616
<dt><dfn title="cursive script">cursive scripts</dfn>
2608-
<dd>The following scripts in Unicode 6 are considered to be cursive scripts,
2609-
and do not admit gaps between their letters for either justification or 'letter-spacing':
2617+
<dd>Cursive scripts do not admit gaps between their letters for either justification or 'letter-spacing'.
2618+
The following <i>Unicode scripts</i> are included:
26102619
Arabic,
26112620
Mandaic,
26122621
Mongolian,
@@ -2627,17 +2636,17 @@ Characters and Properties</h2>
26272636

26282637
<p>Unicode defines three codepoint-level properties that are referenced
26292638
in CSS Text:
2630-
<dl>
2631-
<dt><a href="http://www.unicode.org/reports/tr11/#Definitions">East Asian width</a>
2632-
<dd>Defined in [[!UAX11]] and given as the East_Asian_Width property
2639+
<dl export>
2640+
<dt><dfn title="Unicode East Asian Width|East Asian Width property"><a href="http://www.unicode.org/reports/tr11/#Definitions">East Asian width property</a></dfn>
2641+
<dd>Defined in [[!UAX11]] and given as the <code>East_Asian_Width</code> property
2642+
in the Unicode Character Database [[!UAX44]].
2643+
<dt><dfn title="Unicode General Category|Unicode category|General Category"><a href="http://www.unicode.org/reports/tr44/#General_Category_Values">general category</a></dfn>
2644+
<dd>Defined in [[!UAX44]] and given as the <code>General_Category</code> property
26332645
in the Unicode Character Database [[!UAX44]].
2634-
<dt><a href="http://www.unicode.org/reports/tr44/#General_Category_Values">General Category</a>
2635-
<dd>Defined in [[!UAX44]] and given as the General_Category property
2646+
<dt><dfn title="Unicode Script|Script property"><a href="http://www.unicode.org/reports/tr24/#Values">script property</a></dfn>
2647+
<dd>Defined in [[!UAX24]] and given as the <code>Script</code> property
26362648
in the Unicode Character Database [[!UAX44]].
2637-
<dt><a href="http://www.unicode.org/reports/tr24/#Values">Script property</a>
2638-
<dd>Defined in [[!UAX24]] and given as the Script property
2639-
in the Unicode Character Database [[!UAX44]]. (UAs should
2640-
include any ScriptExtensions.txt assignments in this mapping.)
2649+
(UAs must include any ScriptExtensions.txt assignments in this mapping.)
26412650
</dl>
26422651

26432652
<p>Unicode defines properties for individual codepoints, but sometimes

0 commit comments

Comments
 (0)