Skip to content

Commit 9d198fe

Browse files
committed
[css-text][css-writing-modes] Consolidate definition of character.
1 parent 5e280a3 commit 9d198fe

4 files changed

Lines changed: 184 additions & 193 deletions

File tree

css-text/Overview.bs

Lines changed: 46 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -152,7 +152,7 @@ Terminology</h3>
152152
<h4 id="characters">
153153
Characters and Letters</h4>
154154

155-
<p>The basic unit of typesetting is the <dfn>character</dfn>.
155+
<p>The basic unit of typesetting is the <dfn export>character</dfn>.
156156
However, because writing systems are not always as simple as the basic English alphabet,
157157
what a <a>character</a> actually is depends on the context in which the term is used.
158158
For example, in Hangul (the Korean writing system),
@@ -178,6 +178,8 @@ Characters and Letters</h4>
178178
<p>In turn, a single encoding <a>character</a> can be represented in the data stream as one or more bytes;
179179
and in programming environments one byte is sometimes also called a <a>character</a>.
180180

181+
<p>Therefore the term <a>character</a> is fairly ambiguous where technical precision is required.
182+
181183
<p>For text layout, we will refer to the <dfn export lt="typographic character unit|typographic character">typographic character unit</dfn>
182184
as the basic unit of text.
183185
Even within the realm of text layout,
@@ -202,41 +204,53 @@ Characters and Letters</h4>
202204
as the basis for its <a>typographic character unit</a>.
203205
However, the UA should tailor the definitions
204206
as required by typographic tradition
205-
since the default rules are not always appropriate or ideal,
207+
since the default rules are not always appropriate or ideal--
206208
and is expected to tailor them differently
207209
depending on the operation as needed.
208210

209-
<!--
210211
<p class="note">
211-
The rules for such tailorings are out of scope for CSS,
212+
The rules for such tailorings are out of scope for CSS.
213+
<!--
212214
however W3C currently maintains a wiki page
213215
where some known tailorings are collected.
214216
-->
215217

216218
<div class="example">
217-
<p>For example,
218-
in some scripts such as Myanmar or Devanagari,
219-
the <a>typographic character unit</a> for both justification and line-breaking
220-
is an entire syllable,
221-
which can include more than one [[!UAX29]] <a>grapheme cluster</a>.
222-
223-
<p>In other scripts such as Thai or Lao,
224-
even though for line-breaking the <a>typographic character</a>
225-
matches Unicode’s default <a>grapheme clusters</a>,
226-
for letter-spacing the relevant unit
227-
is <em>less</em> than a [[!UAX29]] <a>grapheme cluster</a>,
228-
and may require decomposition or other substitutions
229-
before spacing can be inserted.
230-
231-
<p>For instance,
232-
to properly letter-space the Thai word คำ (U+0E04 + U+0E33),
233-
the U+0E33 needs to be decomposed into U+0E4D + U+0E32,
234-
and then the extra letter-space inserted before the U+0E32: คํ า.
235-
236-
<p>A slightly more complex example is น้ำ (U+0E19 + U+0E49 + U+0E33).
237-
In this case, normal Thai shaping will first decompose the U+0E33 into U+0E4D + U+0E32
238-
and then swap the U+0E4D with the U+0E49, giving U+0E19 + U+0E4D + U+0E49 + U+0E32.
239-
As before the extra letter-space is then inserted before the U+0E32: นํ้ า.
219+
The following are some examples of <a>typographic character unit</a> tailorings
220+
required by standard typesetting practice:
221+
222+
<ul>
223+
<li>
224+
<p>In some scripts such as Myanmar or Devanagari,
225+
the <a>typographic character unit</a> for both justification and line-breaking
226+
is an entire syllable,
227+
which can include more than one [[!UAX29]] <a>grapheme cluster</a>.
228+
229+
<li>
230+
<p>In other scripts such as Thai or Lao,
231+
even though for line-breaking the <a>typographic character</a>
232+
matches Unicode’s default <a>grapheme clusters</a>,
233+
for letter-spacing the relevant unit
234+
is <em>less</em> than a [[!UAX29]] <a>grapheme cluster</a>,
235+
and may require decomposition or other substitutions
236+
before spacing can be inserted.
237+
238+
<p>For instance,
239+
to properly letter-space the Thai word คำ (U+0E04 + U+0E33),
240+
the U+0E33 needs to be decomposed into U+0E4D + U+0E32,
241+
and then the extra letter-space inserted before the U+0E32: คํ า.
242+
243+
<p>A slightly more complex example is น้ำ (U+0E19 + U+0E49 + U+0E33).
244+
In this case, normal Thai shaping will first decompose the U+0E33 into U+0E4D + U+0E32
245+
and then swap the U+0E4D with the U+0E49, giving U+0E19 + U+0E4D + U+0E49 + U+0E32.
246+
As before the extra letter-space is then inserted before the U+0E32: นํ้ า.
247+
248+
<li>
249+
<p>Vertical typesetting [[!CSS3-WRITING-MODES]] can also require tailoring.
250+
For example, when typesetting ''text-orientation/upright'' text,
251+
Tibetan tsek and shad marks are kept with the preceding grapheme cluster,
252+
rather than treated as an independent <a>typographic character unit</a>.
253+
</ul>
240254
</div>
241255

242256
<p>A <dfn export>typographic letter unit</dfn> or <dfn>letter</dfn> for the purpose of this specification
@@ -2671,8 +2685,8 @@ Appendix C: Scripts and Spacing</h2>
26712685
<h2 id="character-properties" class="no-num">Appendix D.
26722686
Characters and Properties</h2>
26732687

2674-
<p>Unicode defines three codepoint-level properties that are referenced
2675-
in CSS Text:
2688+
<p>Unicode defines four codepoint-level properties that are referenced
2689+
in CSS typesetting:
26762690
<dl export>
26772691
<dt><dfn lt="Unicode East Asian Width|East Asian Width property"><a href="http://www.unicode.org/reports/tr11/#Definitions">East Asian width property</a></dfn>
26782692
<dd>Defined in [[!UAX11]] and given as the <code>East_Asian_Width</code> property
@@ -2684,6 +2698,9 @@ Characters and Properties</h2>
26842698
<dd>Defined in [[!UAX24]] and given as the <code>Script</code> property
26852699
in the Unicode Character Database [[!UAX44]].
26862700
(UAs must include any ScriptExtensions.txt assignments in this mapping.)
2701+
<dt><a href="http://www.unicode.org/reports/tr50/">Vertical Orientation</a>
2702+
<dd>Defined in [[!UTR50]] as the Vertical_Orientation property
2703+
and given in the UTR50 data file.
26872704
</dl>
26882705

26892706
<p>Unicode defines properties for individual codepoints, but sometimes

0 commit comments

Comments
 (0)