22Title : CSS Text Module Level 3
33Shortname : css-text
44Level : 3
5- Status : WD
5+ Status : ED
66Work Status : Refining
77Group : csswg
88ED : https://drafts.csswg.org/css-text-3/
@@ -138,6 +138,10 @@ Module Interactions</h3>
138138 <p> This module, together with [[CSS3-TEXT-DECOR]] ,
139139 replaces and extends the text-level features defined in [[!CSS21]] chapter 16.
140140
141+ <p> In addition to the terms defined below,
142+ other terminology and concepts used in this specification are defined
143+ in [[!CSS21]] and [[!CSS3-WRITING-MODES]] .
144+
141145<h3 id="values">
142146Values</h3>
143147
@@ -150,15 +154,38 @@ Values</h3>
150154 also accept the <a>CSS-wide keywords</a> keywords as their property value.
151155 For readability they have not been repeated explicitly.
152156
153- <h3 id="terms ">
154- Terminology </h3>
157+ <h3 id="languages ">
158+ Languages and Typesetting </h3>
155159
156- <p> In addition to the terms defined below,
157- other terminology and concepts used in this specification are defined
158- in [[!CSS21]] and [[!CSS3-WRITING-MODES]] .
160+ <p><strong class="advisement">
161+ Authors should language-tag their content accurately for the best typographic behavior.
162+ </strong>
163+
164+ <p> The <dfn export>content language</dfn> of an element is the (human) language
165+ the element is declared to be in, according to the rules of the
166+ <a href="https://www.w3.org/TR/CSS21/conform.html#doclanguage">document language</a> .
167+ For example, the rules for determining the <a>content language</a> of an HTML
168+ element use the <code> lang</code> attribute and are defined in [[HTML5]] ,
169+ and the rules for determining the <a>content language</a> of an XML element use
170+ the <code> xml:lang</code> attribute and are
171+ <a href="https://www.w3.org/TR/REC-xml/#sec-lang-tag">defined</a> in [[XML10]] .
172+ Note that it is possible for the <a>content language</a> of an element
173+ to be unknown.
159174
160- <h4 id="characters">
161- Characters and Letters</h4>
175+ Language and writing system conventions can affect
176+ line breaking, hyphenation, justification, glyph selection,
177+ and many other typographic effects.
178+ <strong> In CSS, language-specific typographic tailorings
179+ are only applied when the <a>content language</a> is known (declared).</strong>
180+ Therefore,
181+ higher quality typography requires authors to communicate to the UA
182+ the correct linguistic context of the text in the document.
183+
184+ More information about language tags and their interpretation
185+ can be found in [[#script-tagging]] .
186+
187+ <h3 id="characters">
188+ Characters and Letters</h3>
162189
163190 <p> The basic unit of typesetting is the <dfn export>character</dfn> .
164191 However, because writing systems are not always as simple as the basic English alphabet,
@@ -274,29 +301,6 @@ Characters and Letters</h4>
274301 Authors are forewarned that dividing <a>grapheme clusters</a>
275302 by element boundaries may give inconsistent or undesired results.
276303
277- <h4 id="languages">
278- Languages and Typesetting</h4>
279-
280- <p class="note">
281- Many typographic effects vary by linguistic context.
282- In CSS, language-specific typographic tailorings
283- are only applied when the content language is known (declared).
284-
285- <p><strong class="advisement">
286- Authors should language-tag their content accurately for the best typographic behavior.
287- </strong>
288-
289- <p> The <dfn export>content language</dfn> of an element is the (human) language
290- the element is declared to be in, according to the rules of the
291- <a href="https://www.w3.org/TR/CSS21/conform.html#doclanguage">document language</a> .
292- For example, the rules for determining the <a>content language</a> of an HTML
293- element use the <code> lang</code> attribute and are defined in [[HTML5]] ,
294- and the rules for determining the <a>content language</a> of an XML element use
295- the <code> xml:lang</code> attribute and are
296- <a href="https://www.w3.org/TR/REC-xml/#sec-lang-tag">defined</a> in [[XML10]] .
297- Note that it is possible for the <a>content language</a> of an element
298- to be unknown.
299-
300304<h2 id="transforming">
301305 Transforming Text</h2>
302306
@@ -2381,6 +2385,64 @@ Characters and Properties</h2>
23812385 but take their other properties from the first combining character in the sequence.
23822386 </ul>
23832387
2388+ <h2 id="script-tagging" class="no-num">Appendix E.
2389+ Tagging Content by Writing System</h2>
2390+
2391+ While most languages have a preferred writing system,
2392+ many can also be transcribed into a different writing system.
2393+ As a common example, most languages have at least one Latin transcription,
2394+ and can thus be written in the Latin writing system.
2395+ In these cases the document typically adopts the typographic conventions of the Latin writing system:
2396+ for example Japanese “romaji” and Chinese Pinyin use Latin letters and word spaces,
2397+ and follow Latin line-breaking and justification practices accordingly.
2398+ As another example, historical ideographic Korean
2399+ (<code> ko-Hani</code> )
2400+ does not use word spaces,
2401+ and should therefore be typeset as for Chinese.
2402+
2403+ Authors can indicate the use of an atypical writing system
2404+ with script subtags.
2405+ For example, to indicate use of the Latin writing system
2406+ for languages which don't natively use it,
2407+ the <code> -Latn</code> script subtag can be added,
2408+ e.g. <code> ja-Latn</code> for Japanese romaji.
2409+ Other subtags exist for other writing systems:
2410+ see [[BCP47]] , [[ISO15924]] , and the <a href="http://unicode.org/iso15924/iso15924-codes.html">ISO15924 script tag registry</a> .
2411+ Some common/historical examples follow:
2412+
2413+ <div class="example">
2414+ <dl>
2415+ <dt><code> zh-Latn</code>
2416+ <dd> Chinese, written in Latin transcription.
2417+ <dt><code> ko-Hani</code>
2418+ <dd> Korean, written in Hanja (Chinese ideographic characters).
2419+ <dt><code> tr-Arab</code>
2420+ <dd> Turkish, written in Arabic script.
2421+ <dt><code> mn-Cyrl</code>
2422+ <dd> Mongolian, written in Cyrillic.
2423+ <dt><code> mn-Mong</code>
2424+ <dd> Mongolian, written in traditional Mongolian script.
2425+ </dl>
2426+ </div>
2427+
2428+ UAs should assume the most common writing system
2429+ of the specified <a>content language</a>
2430+ when choosing typographic behaviors
2431+ such as line-breaking or justification strategies,
2432+ but must not assume that writing system
2433+ if the author has explicitly indicated a different one.
2434+ If the UA has no language-specific knowledge
2435+ of a particular language and writing system combination,
2436+ it must use the typographic conventions of the specified writing system
2437+ (assuming the conventions of a different language if necessary),
2438+ not the conventions of that language in a different writing system,
2439+ which would be inappropriate to the writing system used in this case.
2440+
2441+ More advice on language tagging can be found in
2442+ the <a href="https://www.w3.org/International/core/">Internationalization Working Group</a> ’s
2443+ <a href="https://www.w3.org/International/articles/language-tags/">“Language tags in HTML and XML”</a>
2444+ and <a href="https://www.w3.org/International/questions/qa-choosing-language-tags">“Choosing a Language Tag”</a> .
2445+
23842446<h2 id="priv-sec" class="no-num">
23852447Privacy and Security Considerations</h2>
23862448
0 commit comments