2
2
Title : CSS Text Module Level 3
3
3
Shortname : css-text
4
4
Level : 3
5
- Status : WD
5
+ Status : ED
6
6
Work Status : Refining
7
7
Group : csswg
8
8
ED : https://drafts.csswg.org/css-text-3/
@@ -138,6 +138,10 @@ Module Interactions</h3>
138
138
<p> This module, together with [[CSS3-TEXT-DECOR]] ,
139
139
replaces and extends the text-level features defined in [[!CSS21]] chapter 16.
140
140
141
+ <p> In addition to the terms defined below,
142
+ other terminology and concepts used in this specification are defined
143
+ in [[!CSS21]] and [[!CSS3-WRITING-MODES]] .
144
+
141
145
<h3 id="values">
142
146
Values</h3>
143
147
@@ -150,15 +154,38 @@ Values</h3>
150
154
also accept the <a>CSS-wide keywords</a> keywords as their property value.
151
155
For readability they have not been repeated explicitly.
152
156
153
- <h3 id="terms ">
154
- Terminology </h3>
157
+ <h3 id="languages ">
158
+ Languages and Typesetting </h3>
155
159
156
- <p> In addition to the terms defined below,
157
- other terminology and concepts used in this specification are defined
158
- in [[!CSS21]] and [[!CSS3-WRITING-MODES]] .
160
+ <p><strong class="advisement">
161
+ Authors should language-tag their content accurately for the best typographic behavior.
162
+ </strong>
163
+
164
+ <p> The <dfn export>content language</dfn> of an element is the (human) language
165
+ the element is declared to be in, according to the rules of the
166
+ <a href="https://www.w3.org/TR/CSS21/conform.html#doclanguage">document language</a> .
167
+ For example, the rules for determining the <a>content language</a> of an HTML
168
+ element use the <code> lang</code> attribute and are defined in [[HTML5]] ,
169
+ and the rules for determining the <a>content language</a> of an XML element use
170
+ the <code> xml:lang</code> attribute and are
171
+ <a href="https://www.w3.org/TR/REC-xml/#sec-lang-tag">defined</a> in [[XML10]] .
172
+ Note that it is possible for the <a>content language</a> of an element
173
+ to be unknown.
159
174
160
- <h4 id="characters">
161
- Characters and Letters</h4>
175
+ Language and writing system conventions can affect
176
+ line breaking, hyphenation, justification, glyph selection,
177
+ and many other typographic effects.
178
+ <strong> In CSS, language-specific typographic tailorings
179
+ are only applied when the <a>content language</a> is known (declared).</strong>
180
+ Therefore,
181
+ higher quality typography requires authors to communicate to the UA
182
+ the correct linguistic context of the text in the document.
183
+
184
+ More information about language tags and their interpretation
185
+ can be found in [[#script-tagging]] .
186
+
187
+ <h3 id="characters">
188
+ Characters and Letters</h3>
162
189
163
190
<p> The basic unit of typesetting is the <dfn export>character</dfn> .
164
191
However, because writing systems are not always as simple as the basic English alphabet,
@@ -274,29 +301,6 @@ Characters and Letters</h4>
274
301
Authors are forewarned that dividing <a>grapheme clusters</a>
275
302
by element boundaries may give inconsistent or undesired results.
276
303
277
- <h4 id="languages">
278
- Languages and Typesetting</h4>
279
-
280
- <p class="note">
281
- Many typographic effects vary by linguistic context.
282
- In CSS, language-specific typographic tailorings
283
- are only applied when the content language is known (declared).
284
-
285
- <p><strong class="advisement">
286
- Authors should language-tag their content accurately for the best typographic behavior.
287
- </strong>
288
-
289
- <p> The <dfn export>content language</dfn> of an element is the (human) language
290
- the element is declared to be in, according to the rules of the
291
- <a href="https://www.w3.org/TR/CSS21/conform.html#doclanguage">document language</a> .
292
- For example, the rules for determining the <a>content language</a> of an HTML
293
- element use the <code> lang</code> attribute and are defined in [[HTML5]] ,
294
- and the rules for determining the <a>content language</a> of an XML element use
295
- the <code> xml:lang</code> attribute and are
296
- <a href="https://www.w3.org/TR/REC-xml/#sec-lang-tag">defined</a> in [[XML10]] .
297
- Note that it is possible for the <a>content language</a> of an element
298
- to be unknown.
299
-
300
304
<h2 id="transforming">
301
305
Transforming Text</h2>
302
306
@@ -2381,6 +2385,64 @@ Characters and Properties</h2>
2381
2385
but take their other properties from the first combining character in the sequence.
2382
2386
</ul>
2383
2387
2388
+ <h2 id="script-tagging" class="no-num">Appendix E.
2389
+ Tagging Content by Writing System</h2>
2390
+
2391
+ While most languages have a preferred writing system,
2392
+ many can also be transcribed into a different writing system.
2393
+ As a common example, most languages have at least one Latin transcription,
2394
+ and can thus be written in the Latin writing system.
2395
+ In these cases the document typically adopts the typographic conventions of the Latin writing system:
2396
+ for example Japanese “romaji” and Chinese Pinyin use Latin letters and word spaces,
2397
+ and follow Latin line-breaking and justification practices accordingly.
2398
+ As another example, historical ideographic Korean
2399
+ (<code> ko-Hani</code> )
2400
+ does not use word spaces,
2401
+ and should therefore be typeset as for Chinese.
2402
+
2403
+ Authors can indicate the use of an atypical writing system
2404
+ with script subtags.
2405
+ For example, to indicate use of the Latin writing system
2406
+ for languages which don't natively use it,
2407
+ the <code> -Latn</code> script subtag can be added,
2408
+ e.g. <code> ja-Latn</code> for Japanese romaji.
2409
+ Other subtags exist for other writing systems:
2410
+ see [[BCP47]] , [[ISO15924]] , and the <a href="http://unicode.org/iso15924/iso15924-codes.html">ISO15924 script tag registry</a> .
2411
+ Some common/historical examples follow:
2412
+
2413
+ <div class="example">
2414
+ <dl>
2415
+ <dt><code> zh-Latn</code>
2416
+ <dd> Chinese, written in Latin transcription.
2417
+ <dt><code> ko-Hani</code>
2418
+ <dd> Korean, written in Hanja (Chinese ideographic characters).
2419
+ <dt><code> tr-Arab</code>
2420
+ <dd> Turkish, written in Arabic script.
2421
+ <dt><code> mn-Cyrl</code>
2422
+ <dd> Mongolian, written in Cyrillic.
2423
+ <dt><code> mn-Mong</code>
2424
+ <dd> Mongolian, written in traditional Mongolian script.
2425
+ </dl>
2426
+ </div>
2427
+
2428
+ UAs should assume the most common writing system
2429
+ of the specified <a>content language</a>
2430
+ when choosing typographic behaviors
2431
+ such as line-breaking or justification strategies,
2432
+ but must not assume that writing system
2433
+ if the author has explicitly indicated a different one.
2434
+ If the UA has no language-specific knowledge
2435
+ of a particular language and writing system combination,
2436
+ it must use the typographic conventions of the specified writing system
2437
+ (assuming the conventions of a different language if necessary),
2438
+ not the conventions of that language in a different writing system,
2439
+ which would be inappropriate to the writing system used in this case.
2440
+
2441
+ More advice on language tagging can be found in
2442
+ the <a href="https://www.w3.org/International/core/">Internationalization Working Group</a> ’s
2443
+ <a href="https://www.w3.org/International/articles/language-tags/">“Language tags in HTML and XML”</a>
2444
+ and <a href="https://www.w3.org/International/questions/qa-choosing-language-tags">“Choosing a Language Tag”</a> .
2445
+
2384
2446
<h2 id="priv-sec" class="no-num">
2385
2447
Privacy and Security Considerations</h2>
2386
2448
0 commit comments