You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: css-text-3/Overview.bs
+41-7Lines changed: 41 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -181,6 +181,16 @@ Languages and Typesetting</h3>
181
181
the universal <code>xml:lang</code> attribute in XML,
182
182
and the HTTP <code>Content-Language</code> header for content served over HTTP.
183
183
184
+
The [=content language=] an element is declared to be in
185
+
also identifies the specific written form of that language used in that element,
186
+
known as the <dfn export>writing system</dfn>.
187
+
188
+
Note: Depending on the [=document language=]'s facilities for identifying the [=content language=],
189
+
information about the [=writing system=] may only be carried implicitly.
190
+
That is typically the case with the [[BCP47]] language tag used in [[HTML]],
191
+
although it can optionally indicate the [=writing system=] explicitly
192
+
using a script subtag.
193
+
184
194
Language and writing system conventions can affect
185
195
line breaking, hyphenation, justification, glyph selection,
186
196
and many other typographic effects.
@@ -397,6 +407,7 @@ Characters and Letters</h3>
397
407
replaced with a different set of mappings to their respective
398
408
undotted/dotted counterparts, which do not exist in English. This
399
409
mapping must only take effect if the <a>content language</a> is Turkish
410
+
written in its modern Latin-based <a>writing system</a>
400
411
(or another Turkic language that uses Turkish casing rules);
401
412
in other languages, the usual mapping of “I”
402
413
and “i” is required. This rule is thus conditionally
@@ -742,8 +753,8 @@ Characters and Letters</h3>
742
753
<code>W</code>, or <code>H</code> (not <code>A</code>),
743
754
and neither side is Hangul or Emoji (Unicode property <code>Emoji</code>),
744
755
then the segment break is removed.
745
-
<li>Otherwise, if the <a>content language</a> of the <a>segment break</a>
746
-
is Chinese, Japanese, or Yi,
756
+
<li>Otherwise, if the <a>writing system</a> of the <a>segment break</a>
757
+
is <a for=writing-system>Chinese</a>, <a for=writing-system>Japanese</a>, or Yi,
747
758
and the character before or after the segment break
748
759
is punctuation or a symbol (Unicode <a>general category</a> P* or S*)
749
760
and has an <a>East Asian Width property</a> of <code>A</code>
@@ -1182,7 +1193,7 @@ Line Breaking Details</h3>
1182
1193
</ul>
1183
1194
<li>
1184
1195
The following breaks are allowed for ''line-break/normal'' and ''loose'' line breaking
1185
-
if the <a>content language</a> is Chinese or Japanese,
1196
+
if the <a>writing system</a> is <a for=writing-system>Chinese</a> or <a for=writing-system>Japanese</a>,
1186
1197
and are otherwise forbidden:
1187
1198
<ul>
1188
1199
<li>breaks before hyphens:<br>
@@ -1203,7 +1214,7 @@ Line Breaking Details</h3>
1203
1214
</ul>
1204
1215
<li>
1205
1216
The following breaks are allowed for ''loose''
1206
-
if the <a>content language</a> is Chinese or Japanese
1217
+
if the <a>writing system</a> is <a for=writing-system>Chinese</a> or <a for=writing-system>Japanese</a>
1207
1218
and are otherwise forbidden:
1208
1219
<ul>
1209
1220
<li>breaks before certain centered punctuation marks:<br>
@@ -1224,7 +1235,7 @@ Line Breaking Details</h3>
1224
1235
<p class="note">Note: In the requirements listed above,
1225
1236
no distinction is made among the levels of strictness in non-CJK text:
1226
1237
only CJK codepoints are affected,
1227
-
unless the text is marked as Chinese or Japanese,
1238
+
unless the text is marked as <a for=writing-system>Chinese</a> or <a for=writing-system>Japanese</a>,
1228
1239
in which case some additional common codepoints are affected.
1229
1240
1230
1241
<div class="example">
@@ -2566,7 +2577,7 @@ Appendix D: Scripts and Spacing</h2>
2566
2577
The following <a>Unicode scripts</a> are included:
2567
2578
Bopomofo, Han, Hangul, Hiragana, Katakana, and Yi.
2568
2579
Characters of the <a>East Asian Width property</a><code>W</code> and <code>F</code> are also included,
2569
-
but <code>A</code> characters are included only if the <a>content language</a> is Chinese, Korean, or Japanese.
2580
+
but <code>A</code> characters are included only if the <a>writing system</a> is <a for=writing-system>Chinese</a>, <a for=writing-system>Korean</a>, or <a for=writing-system>Japanese</a>.
2570
2581
<dt><dfn>clustered scripts</dfn></dt>
2571
2582
<dd>Clustered scripts have discrete units
2572
2583
and break only at word boundaries,
@@ -2639,6 +2650,8 @@ Characters and Properties</h2>
2639
2650
<h2 id="script-tagging" class="no-num">Appendix F.
2640
2651
Tagging Content by Writing System</h2>
2641
2652
2653
+
<p><em>This appendix is normative.</em></p>
2654
+
2642
2655
While most languages have a preferred writing system,
2643
2656
many can also be transcribed into a different writing system.
2644
2657
As a common example, most languages have at least one Latin transcription,
@@ -2651,7 +2664,8 @@ Tagging Content by Writing System</h2>
2651
2664
does not use word spaces,
2652
2665
and should therefore be typeset as for Chinese.
2653
2666
2654
-
Authors can indicate the use of an atypical writing system
2667
+
In [[HTML]] or any other <a>document language</a> using [[BCP47]] to identify the [=content language=],
2668
+
authors can indicate the use of an atypical writing system
2655
2669
with script subtags.
2656
2670
For example, to indicate use of the Latin writing system
2657
2671
for languages which don't natively use it,
@@ -2689,6 +2703,26 @@ Tagging Content by Writing System</h2>
2689
2703
not the conventions of that language in a different writing system,
2690
2704
which would be inappropriate to the writing system used in this case.
2691
2705
2706
+
The full correspondence between languages and their most common writing system
2707
+
is out of scope for this document.
2708
+
However, User Agents must assume at least the following:
2709
+
2710
+
* If the [=content language=] is Chinese and the [=writing system=] is unspecified,
2711
+
or for any [=content language=] if the [=writing system=] to specified to be one of the ''Hant'', ''Hans'', ''Hani'', ''Hanb'', or ''Bopo''[[ISO15924]] codes,
2712
+
then the [=writing system=] is <dfn no-export for=writing-system>Chinese</dfn>.
2713
+
* If the [=content language=] is Japanese and the [=writing system=] is unspecified,
2714
+
or for any [=content language=] if the [=writing system=] to specified to be one of the ''Jpan'', ''Hrkt'', ''Hira'' or ''Kana''[[ISO15924]] codes,
2715
+
then the [=writing system=] is <dfn no-export for=writing-system>Japanese</dfn>.
2716
+
* If the [=content language=] is Korean and the [=writing system=] is unspecified,
2717
+
or for any [=content language=] if the [=writing system=] to specified to be one of the ''Kore'', ''Hang'', or ''Jamo''[[ISO15924]] codes,
2718
+
then the [=writing system=] is <dfn no-export for=writing-system>Korean</dfn>.
2719
+
* The [=writing system=] is only considered to be <dfn for=writing-system lt='known | unknown'>unknown</dfn>
2720
+
if the [=content language=] itself is unknown,
2721
+
or if it explicitly indicates an unknown writing system.
2722
+
2723
+
Note: Mere omission of the [=writing system=] information when the [=content language=] is specified
2724
+
means the that the [=writing system=] is implied, not unknown.
2725
+
2692
2726
More advice on language tagging can be found in
2693
2727
the <a href="https://www.w3.org/International/core/">Internationalization Working Group</a>’s
2694
2728
<a href="https://www.w3.org/International/articles/language-tags/">“Language tags in HTML and XML”</a>
0 commit comments