Skip to content

Commit 1d4b0a1

Browse files
authored
[css-text-3] Disentangle content language and writing system (#3202)
Closes #2015
1 parent 137cedd commit 1d4b0a1

File tree

2 files changed

+44
-8
lines changed

2 files changed

+44
-8
lines changed

css-fonts-4/Overview.bs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3447,8 +3447,10 @@ traditions found in Spanish, Italian and French orthography:
34473447
If the content language of the element is known according to the
34483448
rules of the <a href="https://www.w3.org/TR/CSS21/conform.html#doclanguage">document language</a>,
34493449
user agents are required to infer the OpenType language system from
3450-
the content language and use that when selecting and positioning
3450+
the [=content language=] and use that when selecting and positioning
34513451
glyphs using an OpenType font.
3452+
If a [=writing system=] has been explicitely specified,
3453+
it must take precedence over the customary one implied by the [=content language=].
34523454

34533455
<!-- previously in level 3, now moved to Level 4 -->
34543456
For OpenType fonts, in some cases it may be necessary to explicitly

css-text-3/Overview.bs

Lines changed: 41 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,16 @@ Languages and Typesetting</h3>
181181
the universal <code>xml:lang</code> attribute in XML,
182182
and the HTTP <code>Content-Language</code> header for content served over HTTP.
183183

184+
The [=content language=] an element is declared to be in
185+
also identifies the specific written form of that language used in that element,
186+
known as the <dfn export>writing system</dfn>.
187+
188+
Note: Depending on the [=document language=]'s facilities for identifying the [=content language=],
189+
information about the [=writing system=] may only be carried implicitly.
190+
That is typically the case with the [[BCP47]] language tag used in [[HTML]],
191+
although it can optionally indicate the [=writing system=] explicitly
192+
using a script subtag.
193+
184194
Language and writing system conventions can affect
185195
line breaking, hyphenation, justification, glyph selection,
186196
and many other typographic effects.
@@ -397,6 +407,7 @@ Characters and Letters</h3>
397407
replaced with a different set of mappings to their respective
398408
undotted/dotted counterparts, which do not exist in English. This
399409
mapping must only take effect if the <a>content language</a> is Turkish
410+
written in its modern Latin-based <a>writing system</a>
400411
(or another Turkic language that uses Turkish casing rules);
401412
in other languages, the usual mapping of &ldquo;I&rdquo;
402413
and &ldquo;i&rdquo; is required. This rule is thus conditionally
@@ -742,8 +753,8 @@ Characters and Letters</h3>
742753
<code>W</code>, or <code>H</code> (not <code>A</code>),
743754
and neither side is Hangul or Emoji (Unicode property <code>Emoji</code>),
744755
then the segment break is removed.
745-
<li>Otherwise, if the <a>content language</a> of the <a>segment break</a>
746-
is Chinese, Japanese, or Yi,
756+
<li>Otherwise, if the <a>writing system</a> of the <a>segment break</a>
757+
is <a for=writing-system>Chinese</a>, <a for=writing-system>Japanese</a>, or Yi,
747758
and the character before or after the segment break
748759
is punctuation or a symbol (Unicode <a>general category</a> P* or S*)
749760
and has an <a>East Asian Width property</a> of <code>A</code>
@@ -1182,7 +1193,7 @@ Line Breaking Details</h3>
11821193
</ul>
11831194
<li>
11841195
The following breaks are allowed for ''line-break/normal'' and ''loose'' line breaking
1185-
if the <a>content language</a> is Chinese or Japanese,
1196+
if the <a>writing system</a> is <a for=writing-system>Chinese</a> or <a for=writing-system>Japanese</a>,
11861197
and are otherwise forbidden:
11871198
<ul>
11881199
<li>breaks before hyphens:<br>
@@ -1203,7 +1214,7 @@ Line Breaking Details</h3>
12031214
</ul>
12041215
<li>
12051216
The following breaks are allowed for ''loose''
1206-
if the <a>content language</a> is Chinese or Japanese
1217+
if the <a>writing system</a> is <a for=writing-system>Chinese</a> or <a for=writing-system>Japanese</a>
12071218
and are otherwise forbidden:
12081219
<ul>
12091220
<li>breaks before certain centered punctuation marks:<br>
@@ -1224,7 +1235,7 @@ Line Breaking Details</h3>
12241235
<p class="note">Note: In the requirements listed above,
12251236
no distinction is made among the levels of strictness in non-CJK text:
12261237
only CJK codepoints are affected,
1227-
unless the text is marked as Chinese or Japanese,
1238+
unless the text is marked as <a for=writing-system>Chinese</a> or <a for=writing-system>Japanese</a>,
12281239
in which case some additional common codepoints are affected.
12291240

12301241
<div class="example">
@@ -2566,7 +2577,7 @@ Appendix D: Scripts and Spacing</h2>
25662577
The following <a>Unicode scripts</a> are included:
25672578
Bopomofo, Han, Hangul, Hiragana, Katakana, and Yi.
25682579
Characters of the <a>East Asian Width property</a> <code>W</code> and <code>F</code> are also included,
2569-
but <code>A</code> characters are included only if the <a>content language</a> is Chinese, Korean, or Japanese.
2580+
but <code>A</code> characters are included only if the <a>writing system</a> is <a for=writing-system>Chinese</a>, <a for=writing-system>Korean</a>, or <a for=writing-system>Japanese</a>.
25702581
<dt><dfn>clustered scripts</dfn></dt>
25712582
<dd>Clustered scripts have discrete units
25722583
and break only at word boundaries,
@@ -2639,6 +2650,8 @@ Characters and Properties</h2>
26392650
<h2 id="script-tagging" class="no-num">Appendix F.
26402651
Tagging Content by Writing System</h2>
26412652

2653+
<p><em>This appendix is normative.</em></p>
2654+
26422655
While most languages have a preferred writing system,
26432656
many can also be transcribed into a different writing system.
26442657
As a common example, most languages have at least one Latin transcription,
@@ -2651,7 +2664,8 @@ Tagging Content by Writing System</h2>
26512664
does not use word spaces,
26522665
and should therefore be typeset as for Chinese.
26532666

2654-
Authors can indicate the use of an atypical writing system
2667+
In [[HTML]] or any other <a>document language</a> using [[BCP47]] to identify the [=content language=],
2668+
authors can indicate the use of an atypical writing system
26552669
with script subtags.
26562670
For example, to indicate use of the Latin writing system
26572671
for languages which don't natively use it,
@@ -2689,6 +2703,26 @@ Tagging Content by Writing System</h2>
26892703
not the conventions of that language in a different writing system,
26902704
which would be inappropriate to the writing system used in this case.
26912705

2706+
The full correspondence between languages and their most common writing system
2707+
is out of scope for this document.
2708+
However, User Agents must assume at least the following:
2709+
2710+
* If the [=content language=] is Chinese and the [=writing system=] is unspecified,
2711+
or for any [=content language=] if the [=writing system=] to specified to be one of the ''Hant'', ''Hans'', ''Hani'', ''Hanb'', or ''Bopo'' [[ISO15924]] codes,
2712+
then the [=writing system=] is <dfn no-export for=writing-system>Chinese</dfn>.
2713+
* If the [=content language=] is Japanese and the [=writing system=] is unspecified,
2714+
or for any [=content language=] if the [=writing system=] to specified to be one of the ''Jpan'', ''Hrkt'', ''Hira'' or ''Kana'' [[ISO15924]] codes,
2715+
then the [=writing system=] is <dfn no-export for=writing-system>Japanese</dfn>.
2716+
* If the [=content language=] is Korean and the [=writing system=] is unspecified,
2717+
or for any [=content language=] if the [=writing system=] to specified to be one of the ''Kore'', ''Hang'', or ''Jamo'' [[ISO15924]] codes,
2718+
then the [=writing system=] is <dfn no-export for=writing-system>Korean</dfn>.
2719+
* The [=writing system=] is only considered to be <dfn for=writing-system lt='known | unknown'>unknown</dfn>
2720+
if the [=content language=] itself is unknown,
2721+
or if it explicitly indicates an unknown writing system.
2722+
2723+
Note: Mere omission of the [=writing system=] information when the [=content language=] is specified
2724+
means the that the [=writing system=] is implied, not unknown.
2725+
26922726
More advice on language tagging can be found in
26932727
the <a href="https://www.w3.org/International/core/">Internationalization Working Group</a>’s
26942728
<a href="https://www.w3.org/International/articles/language-tags/">“Language tags in HTML and XML”</a>

0 commit comments

Comments
 (0)