Skip to content

Commit c2bdb4f

Browse files
committed
[css-text-3] Disentangle content language and writing system
Closes #2015
1 parent 9c553e6 commit c2bdb4f

File tree

2 files changed

+47
-11
lines changed

2 files changed

+47
-11
lines changed

css-fonts-4/Overview.bs

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3448,8 +3448,10 @@ traditions found in Spanish, Italian and French orthography:
34483448
If the content language of the element is known according to the
34493449
rules of the <a href="https://www.w3.org/TR/CSS21/conform.html#doclanguage">document language</a>,
34503450
user agents are required to infer the OpenType language system from
3451-
the content language and use that when selecting and positioning
3451+
the [=content language=] and use that when selecting and positioning
34523452
glyphs using an OpenType font.
3453+
If a [=writing system=] has been explicitely specified,
3454+
it must take precedence over the customary one implied by the [=content language=].
34533455

34543456
<!-- previously in level 3, now moved to Level 4 -->
34553457
For OpenType fonts, in some cases it may be necessary to explicitly

css-text-3/Overview.bs

Lines changed: 44 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -180,6 +180,16 @@ Languages and Typesetting</h3>
180180
the universal <code>xml:lang</code> attribute in XML,
181181
and the HTTP <code>Content-Language</code> header for content served over HTTP.
182182

183+
The [=content language=] an element is declared to be in
184+
also identifies the specific written form of that language used in that element,
185+
known as the <dfn export>writing system</dfn>.
186+
187+
Note: Depending on the [=document language=]'s facilities for identifying the [=content language=],
188+
information about the [=writing system=] may only be carried implicitly.
189+
That is typically the case with the [[BCP47]] language tag used in [[HTML]],
190+
although it can optionally indicate the [=writing system=] explicitly
191+
using a script subtag.
192+
183193
Language and writing system conventions can affect
184194
line breaking, hyphenation, justification, glyph selection,
185195
and many other typographic effects.
@@ -391,6 +401,7 @@ Characters and Letters</h3>
391401
replaced with a different set of mappings to their respective
392402
undotted/dotted counterparts, which do not exist in English. This
393403
mapping must only take effect if the <a>content language</a> is Turkish
404+
written in its customary Latin-based <a>writing system</a>
394405
(or another Turkic language that uses Turkish casing rules);
395406
in other languages, the usual mapping of &ldquo;I&rdquo;
396407
and &ldquo;i&rdquo; is required. This rule is thus conditionally
@@ -726,8 +737,8 @@ Characters and Letters</h3>
726737
<code>W</code>, or <code>H</code> (not <code>A</code>),
727738
and neither side is Hangul or Emoji (Unicode property <code>Emoji</code>),
728739
then the segment break is removed.
729-
<li>Otherwise, if the <a>content language</a> of the <a>segment break</a>
730-
is Chinese, Japanese, or Yi,
740+
<li>Otherwise, if the <a>writing system</a> of the <a>segment break</a>
741+
is <a for=writing-system>Chinese</a>, <a for=writing-system>Japanese</a>, or Yi,
731742
and the character before or after the segment break
732743
is punctuation or a symbol (Unicode <a>general category</a> P* or S*)
733744
and has an <a>East Asian Width property</a> of <code>A</code>
@@ -1166,7 +1177,7 @@ Line Breaking Details</h3>
11661177
</ul>
11671178
<li>
11681179
The following breaks are allowed for ''line-break/normal'' and ''loose'' line breaking
1169-
if the <a>content language</a> is Chinese or Japanese,
1180+
if the <a>writing system</a> is <a for=writing-system>Chinese</a> or <a for=writing-system>Japanese</a>,
11701181
and are otherwise forbidden:
11711182
<ul>
11721183
<li>breaks before hyphens:<br>
@@ -1187,7 +1198,7 @@ Line Breaking Details</h3>
11871198
</ul>
11881199
<li>
11891200
The following breaks are allowed for ''loose''
1190-
if the <a>content language</a> is Chinese or Japanese
1201+
if the <a>writing system</a> is <a for=writing-system>Chinese</a> or <a for=writing-system>Japanese</a>
11911202
and are otherwise forbidden:
11921203
<ul>
11931204
<li>breaks before certain centered punctuation marks:<br>
@@ -1208,7 +1219,7 @@ Line Breaking Details</h3>
12081219
<p class="note">In the requirements listed above,
12091220
no distinction is made among the levels of strictness in non-CJK text:
12101221
only CJK codepoints are affected,
1211-
unless the text is marked as Chinese or Japanese,
1222+
unless the text is marked as <a for=writing-system>Chinese</a> or <a for=writing-system>Japanese</a>,
12121223
in which case some additional common codepoints are affected.
12131224

12141225
<div class="example">
@@ -1562,7 +1573,7 @@ Shaping Across Intra-word Breaks</h3>
15621573
<dt><dfn>auto</dfn></dt>
15631574
<dd>The UA determines the justification algorithm to follow, based
15641575
on a balance between performance and adequate presentation quality.
1565-
Since justification rules vary by writing system and language,
1576+
Since justification rules vary by [=writing system=],
15661577
UAs should, where possible, use a justification algorithm appropriate to the text.
15671578

15681579
<p class="example">
@@ -1571,11 +1582,11 @@ Shaping Across Intra-word Breaks</h3>
15711582
primarily expanding <a>word separators</a>
15721583
and between CJK <a>typographic letter units</a>
15731584
along with secondarily expanding between Southeast Asian <a>typographic letter units</a>.
1574-
Then, in cases where the <a>content language</a> of the paragraph is known,
1585+
Then, in cases where the <a>writing system</a> of the paragraph is <a for=writing-system>known</a>,
15751586
it could choose a more language-tailored justification behavior
15761587
e.g. following [[JLREQ]] for Japanese,
15771588
using cursive elongation for Arabic,
1578-
using ''inter-word'' for German,
1589+
using ''inter-word'' for Latin,
15791590
etc.
15801591

15811592
<div class="figure" id="fig-text-justify-cursive">
@@ -2506,7 +2517,7 @@ Appendix D: Scripts and Spacing</h2>
25062517
The following <a>Unicode scripts</a> are included:
25072518
Bopomofo, Han, Hangul, Hiragana, Katakana, and Yi.
25082519
Characters of the <a>East Asian Width property</a> <code>W</code> and <code>F</code> are also included,
2509-
but <code>A</code> characters are included only if the <a>content language</a> is Chinese, Korean, or Japanese.
2520+
but <code>A</code> characters are included only if the <a>writing system</a> is <a for=writing-system>Chinese</a>, <a for=writing-system>Korean</a>, or <a for=writing-system>Japanese</a>.
25102521
<dt><dfn>clustered scripts</dfn></dt>
25112522
<dd>Clustered scripts have discrete units
25122523
and break only at word boundaries,
@@ -2579,6 +2590,8 @@ Characters and Properties</h2>
25792590
<h2 id="script-tagging" class="no-num">Appendix F.
25802591
Tagging Content by Writing System</h2>
25812592

2593+
<p><em>This appendix is normative.</em></p>
2594+
25822595
While most languages have a preferred writing system,
25832596
many can also be transcribed into a different writing system.
25842597
As a common example, most languages have at least one Latin transcription,
@@ -2591,7 +2604,8 @@ Tagging Content by Writing System</h2>
25912604
does not use word spaces,
25922605
and should therefore be typeset as for Chinese.
25932606

2594-
Authors can indicate the use of an atypical writing system
2607+
In [[HTML]] or any other <a>document language</a> using [[BCP47]] to identify the [=content language=],
2608+
authors can indicate the use of an atypical writing system
25952609
with script subtags.
25962610
For example, to indicate use of the Latin writing system
25972611
for languages which don't natively use it,
@@ -2629,6 +2643,26 @@ Tagging Content by Writing System</h2>
26292643
not the conventions of that language in a different writing system,
26302644
which would be inappropriate to the writing system used in this case.
26312645

2646+
The full correspondence between languages and their most common writing system
2647+
is out of scope for this document.
2648+
However, User Agents must assume at least the following:
2649+
2650+
* If the [=content language=] is Chinese and the [=writing system=] is unspecified,
2651+
or for any [=content language=] if the [=writing system=] to specified to be one of the ''Hant'', ''Hans'', ''Hani'', ''Hanb'', or ''Bopo'' [[ISO15924]] codes,
2652+
then the [=writing system=] is <dfn no-export for=writing-system>Chinese</dfn>.
2653+
* If the [=content language=] is Japanese and the [=writing system=] is unspecified,
2654+
or for any [=content language=] if the [=writing system=] to specified to be one of the ''Jpan'', ''Hrkt'', ''Hira'' or ''Kana'' [[ISO15924]] codes,
2655+
then the [=writing system=] is <dfn no-export for=writing-system>Japanese</dfn>.
2656+
* If the [=content language=] is Korean and the [=writing system=] is unspecified,
2657+
or for any [=content language=] if the [=writing system=] to specified to be one of the ''Kore'', ''Hang'', or ''Jamo'' [[ISO15924]] codes,
2658+
then the [=writing system=] is <dfn no-export for=writing-system>Korean</dfn>.
2659+
* The [=writing system=] is only considered to be <dfn for=writing-system lt='known | unknown'>unknown</dfn>
2660+
if the [=content language=] itself is unknown,
2661+
or if it explicitly indicates an unknown writing system.
2662+
2663+
Note: Mere omission of the [=writing system=] information when the [=content language=] is specified
2664+
means the that the [=writing system=] is implied, not unknown.
2665+
26322666
More advice on language tagging can be found in
26332667
the <a href="https://www.w3.org/International/core/">Internationalization Working Group</a>’s
26342668
<a href="https://www.w3.org/International/articles/language-tags/">“Language tags in HTML and XML”</a>

0 commit comments

Comments
 (0)