You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: ideas/at-text-transform.txt
+30-27Lines changed: 30 additions & 27 deletions
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ This is an early draft for a possible generic mechanism to allow authors to defi
4
4
5
5
The general form of an @text-transform at-rule is:
6
6
7
-
<code>
7
+
<code css>
8
8
@text-transform <transform-name>
9
9
{ [ descriptor: value; ]+ }
10
10
</code>
@@ -15,7 +15,7 @@ A text transform created using this at-rule may be used simply by using <transfo
15
15
16
16
===== The transformation descriptor =====
17
17
18
-
<code>
18
+
<code bnf>
19
19
Name: transformation
20
20
Value: <conversion>#
21
21
default: N/A
@@ -26,7 +26,6 @@ default: N/A
26
26
<enumeration> = <string>
27
27
</code>
28
28
29
-
30
29
This descriptor defines which character will be replaced by which, by listing a series of conversions, to be applied in the same order as they appear in the descriptor.
31
30
32
31
Conversions may refer to existing text transforms, either predefined by CSS or defined by the author. While an transformation using only a single such conversion is not very useful, combining it with other conversions allows authors to extend or define variants of existing transforms. Referring to the text-transform currently being define is not allowed, and makes the whole descriptor invalid.
@@ -52,19 +51,22 @@ In a <conversion>, If the source <char-list> is longer than the target <char-lis
52
51
<note warning>ISSUE 4: It has been suggested that it should be possible to write text-transforms that behave differently on different languages. This can probably be achieved by adding some optional part at the beginning of each <conversion>, although I am not sure what
53
52
the syntax should be.</note>
54
53
55
-
56
54
Examples:
57
55
58
-
<code>
59
-
@text-transform latin-only-uppercase { transformation: "a-z" to "A-Z"; }
56
+
<code css>
57
+
@text-transform latin-only-uppercase
58
+
{
59
+
transformation: "a-z" to "A-Z";
60
+
}
60
61
</code>
61
62
62
-
63
63
The following two transforms are identical.
64
64
65
-
<code>
66
-
@text-tranform abcdef1 { transformation: "abc" to "def"; }
67
-
65
+
<code css>
66
+
@text-tranform abcdef1
67
+
{
68
+
transformation: "abc" to "def";
69
+
}
68
70
@text-tranform abcdef2
69
71
{
70
72
transformation: "a" to "d",
@@ -76,7 +78,7 @@ The following two transforms are identical.
76
78
77
79
===== The character-type descriptor =====
78
80
79
-
<code>Name: character-type
81
+
<code bnf>Name: character-type
80
82
Value: extended | legacy | single
81
83
Default: extended
82
84
</code>
@@ -98,7 +100,7 @@ This definition affects character processing in two different contexts:
98
100
99
101
100
102
===== The scope descriptor =====
101
-
<code>Name: scope
103
+
<code bnf>Name: scope
102
104
Value: all | [initial || medial || final]
103
105
Default: all
104
106
</code>
@@ -112,7 +114,6 @@ This descriptor makes it possible to restrict which characters in the source tex
112
114
113
115
<note warning>ISSUE 7: More fancy values could be added here in the future to support things like title case, or to match only the base character, or only the diacritics.</note>
114
116
115
-
116
117
The definition of "word" is UA-dependent; [[http://www.unicode.org/reports/tr29/tr29-17.html|UAX29]] is suggested (but not required) for determining such word boundaries.
117
118
118
119
The transformation descriptor may be used to refer to existing text-transforms in the definition of a new one. If the text-transforms
@@ -121,9 +122,11 @@ two scopes.
121
122
122
123
Example:
123
124
124
-
<code>
125
-
@text-transform latin-only-uppercase { transformation: "a-z" to "A-Z"; }
126
-
125
+
<code css>
126
+
@text-transform latin-only-uppercase
127
+
{
128
+
transformation: "a-z" to "A-Z";
129
+
}
127
130
@text-transform latin-only-capitalize
128
131
{
129
132
transformation: latin-only-uppercase;
@@ -141,7 +144,7 @@ The following use cases only apply to a single language. Defining all the possib
141
144
==== Full-size kana ====
142
145
In Japanese, small kanas appearing within ruby are sometimes replaced by the equivalent full-size kana. The following transform defines this conversion
143
146
144
-
<code>
147
+
<code css>
145
148
@text-transform full-size-kana
146
149
{
147
150
transformation: "ぁぃぅぇぉゕゖっゃゅょゎ" to "あいうえおかけつやゆよわ",
@@ -189,8 +192,7 @@ The uppercasing and lowercasing algorithm defined for the text-transform propert
189
192
190
193
Someone, for example in a user style sheet, may want to apply an uppercase or lowercase transform to a document where language is insufficiently marked up, but known to the author of the style sheet to be Turkish. In this case, the generic uppercase and lowercase transforms would fail, but the following would work.
The Georgian language has used three different unicameral alphabets through history: Asomtavruli, Nuskhuri, and Mkhedruli. Recently, some authors have been using Asomtavruli letters in an otherwise Mkhedruli text, in a way that resembles a bicameral alphabet. One may assume that they would find the following transform useful.
211
213
212
-
<code>
214
+
<code css>
213
215
@text-transform Mkhedruli-to-Asomtavruli
214
216
{
215
217
transformation: "ა-ჵ" to "Ⴀ-Ⴥ";
216
218
}
217
-
</code>
218
-
<code>
219
+
219
220
@text-transform Asomtavruli-to-Mkhedruli
220
221
{
221
222
transformation: "Ⴀ-Ⴥ" to "ა-ჵ";
@@ -235,7 +236,7 @@ In old (18th century and earlier) European texts, the letter s, when at the midd
235
236
236
237
Modern readers are often unfamiliar with this letter form, and for readability reasons, one may want to convert from one to the other. The follow transform would accomplish this.
237
238
238
-
<code>
239
+
<code css>
239
240
@text-transform modernize-s
240
241
{
241
242
transformation: "ſ" to "s";
@@ -244,7 +245,7 @@ Modern readers are often unfamiliar with this letter form, and for readability r
244
245
245
246
This does the opposite transform:
246
247
247
-
<code>
248
+
<code css>
248
249
@text-transform long-s
249
250
{
250
251
transformation: "s" to "ſ" ;
@@ -260,7 +261,7 @@ Here are some more example of how the generic mechanism may be used
260
261
261
262
Most writing systems of the world have at least one common transliteration scheme into the roman script.
262
263
263
-
<code>
264
+
<code css romanization.css>
264
265
@text-transform romanization
265
266
{/* ISO 9 (Cyrillic) */
266
267
transformation: "А а Ӑ ӑ Ӓ ӓ Ә ә Б б В в Г г Ґ ґ Ҕ ҕ Ғ ғ Д д Ђ ђ Ѓ ѓ Е е Ё ё Ӗ ӗ Є є Ҽ ҽ Ҿ ҿ
@@ -280,12 +281,14 @@ Most writing systems of the world have at least one common transliteration schem
280
281
N n X x O o Ó ó P p R r S s s T t Y y Ý ý Ÿ ÿ F f Ch ch Ps ps Ō ō Ṓ ṓ";
281
282
}
282
283
</code>
284
+
<note>The Greek example above only works if ISSUE 2 is resolved, because Theta, Chi and Psi are transliterated into digraphs that don’t have a single code point in Unicode.</note>
285
+
283
286
==== Comic book vikings ====
284
287
In the "Asterix and the Great Crossing" comic book, the Viking characters are supposed to speak a foreign language unintelligible to the main characters, but still understandable to the readers. This is represented by writing down their speech normally, except that some letters are replaced by similarly looking letters found in Scandinavian languages.
285
288
286
289
This effect could be obtained by the following transform:
287
290
288
-
<code>
291
+
<code css>
289
292
@text-transform fake-norse
290
293
{
291
294
transformation: "aoAO" to "åøÅØ";
@@ -295,7 +298,7 @@ This effect could be obtained by the following transform:
295
298
==== Leet speak ====
296
299
In Internet, hacker and gamer culture, a phenomenon is quite common, where characters are replaced by other characters or character sequences which have a somewhat similar glyphic appearance. Although no single consensual convention exists and sometimes mappings are neither injective nor surjective, one could simulate this playful style with a transform like the following:
0 commit comments