-
Notifications
You must be signed in to change notification settings - Fork 708
/
Copy pathtext-justify-i18n.src.html
327 lines (270 loc) · 14.2 KB
/
text-justify-i18n.src.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
<h1>Text Justification</h1>
<pre class='metadata'>
Shortname: text-justify
Level: 1
Status: UD
Work Status: Exploring
Group: csswg
ED: https://drafts.csswg.org/css-text-3/text-justify-i18n
TR: https://www.w3.org/TR/text-justify-i18n/
Editor: Elika J. Etemad / fantasai, Invited Expert, http://fantasai.inkedblade.net/contact
Abstract: This Note serves as a clearinghouse to further information on worldwide conventions for text justification: the process of stretching text to fill a line.
</pre>
<h2 id="intro">
Introduction</h2>
Since the amount of content on a line tends to vary,
even if minutely, from line to line within a paragraph,
typographers have come up with various methods
for effective <dfn lt="full justification | full text justification">full justification</dfn>--
causing the text to completely fill the text--
in order to create visual alignment on both edges of a paragraph.
Typographic conventions for full text justification depend on the writing system,
the content language,
and the calligraphic style of the text.
Results also tend to vary based on the capabilities of the layout engine
and a given typographer’s preferences for weighing
its various detrimental effects on typographic color and readability.
This document collects together references for further information
on the typographic conventions for full justification
as they apply to the various writing systems around the world,
together with some guidance for implementers handling unpredictable Web content.
(General information and technical requirements for CSS
are described under the <a href="https://dev.w3.org/csswg/css-text-3/#justification">Justification</a> section of [[!CSS3TEXT]].)
Advisement: Additional information and references are hereby solicited;
please send any suggestions for additions, clarifications, corrections, and other improvements
to the <a href="https://www.w3.org/International/">W3C Internationalization Working Group</a> at
<a href="mailto:www-international@w3.org">www-international@w3.org</a>.
Note: Information on which languages use which writing systems
is maintained in the Unicode CLDR.
<h2>References</h2>
<h3 id=zh>
Chinese Writing System (Han Ideographs)</h3>
Historically, Chinese was written as Han ideographs, with no punctuation.
Under this system, justification was automatic,
as the characters fit perfectly into a square grid.
However, the introduction of punctuation in recent centuries,
plus the increase in mixed-script text
(such as the inclusion of European numbers and/or words, phrases, names, and trademarks)
has created a need for adjustments within a line.
Chinese notably does not use word spaces,
so these do not provide a justification opportunity within the lines;
thus justification techniques focus on adjustments to spacing around punctuation,
script-change boundaries,
and inter-character spacing.
<ul>
<li><a href="https://www.w3.org/TR/clreq/#line_composition_rules_for_punctuation_marks">Chinese Layout Requirements: Line Composition Rules for Punctuation Marks</a>
<li><a href="https://www.w3.org/TR/clreq/#chinese_and_western_mixed_text_composition">Chinese Layout Requirements: Composition of Chinese and Western Mixed Texts</a>
</ul>
<h3 id=ja>
Japanese Writing System</h3>
Like Chinese, Japanese was historically written in Han ideographs;
however it has since developed its own phonetic scripts
Hiragana and Katakana (collectively, Kana).
While pure kana texts do exist,
particularly in children's literature,
Han ideographs (Kanji, in Japanese) continue to be an integral part of normal Japanese text,
and are interspersed with kana within a sentence.
Like Chinese,
embraced European-inspired punctuation, numerals, and other foreign snippets
that don't conform to the standard full-width character grid.
The Japanese writing system also does not use word spaces,
and similarly focuses on adjustments to spacing around punctuation,
script-change boundaries,
and inter-character spacing,
with a notable preference for compression of intra-glyph spacing
over expansion between glyphs.
<ul>
<li><a href="https://www.w3.org/TR/jlreq/#line_adjustment">Japanese Layout Requirements: Line Adjustment</a>
<li><a href="https://www.w3.org/TR/jlreq/#opportunities_for_intercharacter_space_reduction_during_line_adjustment">Japanese Layout Requirements: Opportunities for Inter-character Space Reduction during Line Adjustment</a>
<li><a href="https://www.w3.org/TR/jlreq/#opportunities_for_intercharacter_space_expansion_during_line_adjustment">Japanese Layout Requirements: Opportunities for Inter-character Space Expansion during Line Adjustment</a>
</ul>
<h3 id=ko>
Korean Writing System</h3>
Like Japanese, Korean was historically written in pure Han ideographs,
and has since developed its own phonetic script, Hangul.
Also like Japanese, it has adopted punctuation and numerals.
However, unlike Japanese, Korean has also adopted word spaces,
and tends towards narrow (Western-style, rather than full-width) punctuation.
This allows it to use inter-word justification:
as in English publications, this method stretches the spaces between words
in order to fill the line.
While Han ideographs (Hanja, in Korean) were kept as part of the writing system,
they have become increasingly scarce over time
such that many documents are written in pure Hangul,
and some only use Hanja as inline annotations for disambiguation among homophones
rather than as part of the main text.
However, Hanja and Hangul together remain important components of Korean writing.
<ul>
<li><a href="https://www.w3.org/TR/klreq/#paraadjust">Hangul Layout Requirements: Paragraph Adjustment</a>
<li><a href="https://www.w3.org/TR/klreq/#line-adjust">Hangul Layout Requirements: Line Adjustment Process</a>
</ul>
<h3 id=latn>
Latin (Roman) Writing System</h3>
Quite possibly the writing system familiar to more people than any other,
the Latin writing system derives from the Roman alphabet,
including a few additional characters and diacritic marks
to accommodate languages such as Icelandic and modern Vietnamese.
Thanks to the Europeans in the Age of Exploration,
their missionaries,
and the Western-dominated global scholastic culture of the modern age,
most languages in the world have one or more Latin transcriptions,
even those that do not use it as their primary writing system.
The Latin alphabet is a phonetic system with disjoint letterforms,
and typically uses spaces between words.
This allows it to use inter-word justification,
although it can and sometimes does increase the spacing between individual letters as well.
Since it is frequently adopted into other writing systems,
it can sometimes adopt characteristics of that system;
for example, some styles of Japanese typesetting
treat Latin letters the same as Japanese characters
for the purpose of line-breaking and justification.
<ul>
<li><a href="https://www.w3.org/TR/dpub-latinreq/#justification">Latin Layout Requirements: Justification</a>
</ul>
<h3 id=ethiopic>
Ethiopic Writing System</h3>
Like Latin, the Ethiopic writing system uses an alphabet of disjoint letters
and uses punctuation to indicate the break between words.
Unlike Latin, Ethiopic traditionally uses a visible word separator--
the Ethiopic Word Space U+1361 “፡”--
although modern documents sometimes use a regular space U+0020 “ ” instead.
Justification strategies are as for Latin:
increasing the space at the word separator,
and/or distributing space between letters.
<ul>
<li><a href="https://w3c.github.io/elreq/#ethiopic_justification">Ethiopic Layout Requirements: Justification
</ul>
<h3 id=arabic>
Arabic Writing System (and Other Cursive Systems)</h3>
Arabic is a cursive script,
meaning its letters are typically joined together within a word.
This creates additional challenges,
as the usual method for stretching out text--
inserting spaces between glyphs--
does not work.
Since Arabic uses spaces between words,
one method for justification is inter-word justification--
stretching out the spaces within the line to fill it.
However, most styles of Arabic writing prefer calligraphic elongation or compression,
distorting the shapes and connections between letters
in order to fill the line while preserving its typographic color.
This is often called “kashida”, meaning “stretched”.
A simplistic variant of this technique inserts elongation marks
(sometimes represented with U+0640 “ـ” TATWEEL)
at appropriate points in the text.
<ul>
<li><a href="https://www.tug.org/tugboat/tb27-2/tb87benatia.pdf">Arabic Text Justification</a>
<li><a href="https://quod.lib.umich.edu/j/jep/3336451.0013.105/--justify-just-or-just-justify?rgn=main;view=fulltext">Justify Just or Just Justify (Arabic text justification)</a>
<li><a href="https://rishida.net/blog/?p=1059">Typography questions for HTML & CSS: Arabic justification</a>
<li><a href="https://www.cle.org.pk/Publication/papers/2004/rule-based-expert-system.pdf">Rule-based expert system for Urdu nastaleeq justification</a>
<li><a href="https://www.unicode.org/L2/L2015/15148-ethiopic-wordspace.pdf">Proposal to Reclassify Ethiopic Wordspace as a Space Separator (Zs) Symbol</a>
</ul>
Syriac and Mongolian
have properties similar to Arabic,
and in the absence of additional information should be given
similar treatment for justification.
<h3 id=tibetan>
Tibetan Writing System</h3>
Tibetan is a Brahmic writing system related to Indic scripts like Devanagari and Gujarati;
however, unlike these systems, it does not use Western-style punctuation
nor spaces between words,
and instead uses the Tibetan Tsheg Mark U+0F0B “་”
between syllables
and its own punctuation marks such as the Tibetan Shad U+0F0D “།” and Tibetan Nyis Shad U+ 0F0E “༎”,
which indicate the end of longer segments.
Justification techniques used in Tibetan include stretching the space after a shad, minutely increasing the spaces after tsheg marks,
and simply filling the remaining space on a line with tsheg marks.
<ul>
<li><a href="https://r12a.github.io/scripts/tibetan#justification">Tibetan Script Notes</a>
<li><a href="https://www.chinaw3c.org/layout-workshop-report.html#talk5">Tibetan Script Requirements (.ppt)</a>
</ul>
<h3 id=southeast>
Southeast Asian Writing Systems</h3>
In Southeast Asian systems such as Thai and Lao,
letters are merged together into “clusters”.
There are no spaces between words
(lines must be broken by dictionary),
but spaces serve to separate larger units of text.
Techniques for justification include stretching spaces on the line
(if it happens to have any)
and interspersing extra space between clusters.
Scripts in this category include
Khmer, Myanmar, Lao, and Thai.
<h3 id="other">
Other Writing Systems</h3>
Most (but not all) writing systems not mentioned here
have discrete letters, like Latin,
and in the absence of more specific information
may be assumed to justify in a similar manner.
Note: Readers who wish to provide such “more specific information”
are invited (and strongly encouraged)
to contact the <a href="mailto:www-internation@w3.org">W3C Internationalization Working Group</a>
so that this document may be updated.
<h2 id=guide>
Guidance for Authors and Implementers</h2>
<h3 id=lang>
Tagging Content By Writing System</h3>
While most languages have a preferred writing system,
many can be transcribed into a different system.
As a common example, most languages have a Latin transcription,
and can thus be written in the Latin writing system.
In these cases the document typically adopts the typographic conventions of the Latin writing system:
for example Japanese “romaji” and Chinese Pinyin use word spaces and justify accordingly.
As another example, historical ideographic Korean
(<code>ko-Hant</code>)
does not use word spaces,
and should therefore be justified as for Chinese.
Authors can indicate the use of the Latin writing system
with the <code>-Latn</code> language subtag,
e.g. <code>ja-Latn</code> for Japanese romaji.
Other subtags exist for other writing systems,
see ????.
Some common/historical examples follow:
<div class="example">
<dl>
<dt><code>zh-Latn</code>
<dd>Chinese, written in Latin transcription
<dt><code>ko-Hant</code>
<dd>Korean, written in Hanja (Chinese ideographic characters)
<dt><code>??-Arab</code>
<dd>Turkish, written in Arabic script.
<dt><code>??-???</code>
<dd>Mongolian, written in Cyrillic
<dt><code>??-???</code>
<dd>Mongolian, written in traditional Mongolian script.
</dl>
</div>
UAs should assume the most common writing system for a given language
when choosing a justification strategy,
but must not assume that writing system
if the author has explicitly indicated a different one.
<h3 id=unknown>
Justifying Untagged Content</h3>
Web browsers frequently have to deal with untagged, potentially mixed-script content.
The following are some guidelines for designing a strategy to deal with such content.
<ul>
<li>Since Chinese and Japanese do not use spaces to provide justification opportunities,
CJK content (Han, Hiragana, Katakana, and Hangul)
should be allowed to accept inter-character spacing.
<li>Since Japanese content prefers compression,
CJK fullwidth punctuation characters, if present on a line,
should be compressed at a higher priority (if possible) than expanding spaces
or letter-spacing.
<li>Since Korean prefers expanding spaces to expanding between characters,
spaces should be expanded at a higher priority (if possible)
than letter-spacing.
</ul>
Advisement: Authors should use (correct) language tags
in order to get the best possible typographic behavior.
For example, if Japanese text is tagged as Japanese,
the UA knows to preferentially compress the space rather than expand it.
<h2 id=acknowledgements>
Acknowledgements</h2>
This document was compiled with guidance from:
the W3C <a href="http://www.w3.org/International/">Internationalization</a>
and <a href="http://www.w3.org/Style/CSS/">CSS</a> Working Groups,
and the W3C
<a href="https://www.w3.org/2007/02/japanese-layout/">Japanese</a>,
<a href="https://www.w3.org/International/groups/chinese-layout/">Chinese</a>,
<a href="http://w3c.github.io/klreq/">Korean</a>,
and <a href="https://www.w3.org/International/groups/ethiopic-layout/">Ethiopic</a> Language Task Forces.