Skip to content

Commit 9267905

Browse files
committed
[css2] Added that \0 is undefined.
--HG-- extra : convert_revision : svn%3A73dc7c4b-06e6-40f3-b4f7-9ed1dbc14bfc/trunk%402247
1 parent e1ce41a commit 9267905

1 file changed

Lines changed: 76 additions & 21 deletions

File tree

css2/syndata.src

Lines changed: 76 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
22
"http://www.w3.org/TR/1998/REC-html40-19980424/loose.dtd">
33
<html lang="en">
4-
<!-- $Id: syndata.src,v 2.103 2004-03-08 18:40:14 bbos Exp $ -->
4+
<!-- $Id: syndata.src,v 2.104 2004-03-25 18:09:51 bbos Exp $ -->
55
<head>
66
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
77
<title>Syntax and basic data types</title>
@@ -264,6 +264,8 @@ href="#parsing-errors">rules for handling parsing errors</a>. However, because t
264264
is followed by at most six hexadecimal digits (0..9A..F), which
265265
stand for the ISO 10646 ([[ISO10646]])
266266
character with that number, which must not be zero.
267+
(It is undefined in CSS&nbsp;2.1 what happens if a style sheet
268+
<em>does</em> contain a zero.)
267269
If a character in the range [0-9a-fA-F] follows the hexadecimal number,
268270
the end of the number needs to be made clear. There are two ways
269271
to do that:
@@ -1203,10 +1205,10 @@ encoding</span> (from highest priority to lowest):
12031205
<li>Assume UTF-8</li>
12041206
</ol>
12051207

1206-
<p>At most one @charset rule may appear in an external style sheet and
1207-
it must appear at the very start of the style sheet or immediately
1208-
after a Byte Order Mark (BOM, U+FEFF) that is at the very start of the
1209-
style sheet. Any other @charset rules must be ignored by the UA.
1208+
<p>Authors using an @charset rule must place the rule at the very
1209+
beginning of the style sheet, preceded by no characters. (If a byte
1210+
order mark is appropriate for the encoding used, it may precede the
1211+
@charset rule.)
12101212
</p>
12111213

12121214
<p>After "@charset", authors specify the name of a character encoding
@@ -1229,25 +1231,78 @@ registry.
12291231
<p>This specification does not mandate which character encodings a
12301232
user agent must support.
12311233

1232-
<p>This specification does not specify what algorithm a UA must
1233-
apply to derive the encoding from the BOM and the @charset. In
1234-
particular, it does not specify the encoding to use if the BOM and the
1235-
@charset conflict. This is expected to be defined in CSS3.
1234+
<p>User agents must ignore any @charset rule not at the beginning of the
1235+
style sheet. When user agents detect the character encoding using the
1236+
BOM and/or the @charset rule, they should follow the following rules:
12361237
</p>
12371238

1238-
<p class="note">Note that reliance on the @charset construct
1239-
theoretically poses a
1240-
problem since there is no <em>a priori</em> information on how it is
1241-
encoded. In practice, however, the encodings in wide use on the
1242-
Internet are either based on ASCII, UTF-16, UCS-4, or (rarely) on
1243-
EBCDIC. This means that in general, the initial byte values of a
1244-
style sheet enable a user agent to detect the encoding family reliably,
1245-
which provides enough information to decode the @charset rule, which
1246-
in turn determines the exact character encoding.
1247-
</p>
1248-
<!-- More examples of good encodings to use? -IJ -->
1239+
<ul>
1240+
1241+
<li>Except as specified in these rules, all @charset rules are ignored.</li>
1242+
1243+
<li>The encoding is detected based on the stream of bytes that begins
1244+
the stylesheet. The following table gives a set of possibilities for
1245+
initial byte sequences (written in hexadecimal). The first row that
1246+
matches the beginning of the stylesheet gives the result of encoding
1247+
detection based on the BOM and/or @charset rule. If no rows match, the
1248+
encoding cannot be detected based on the BOM and/or @charset rule. The
1249+
notation (...)* refers to repetition for which the best match is the one
1250+
that repeats as few times as possible. The bytes marked "XX" are those
1251+
used to determine the name of the encoding, by treating them, in the
1252+
order given, as a sequence of ASCII characters. Bytes marked "YY" are
1253+
similar, but need to be transcoded into ASCII as noted. User agents may
1254+
ignore entries in the table if they do not support any encodings
1255+
relevant to the entry.
1256+
1257+
<table border="1"
1258+
summary="Relationship between initial bytes of sheet and chosen encoding">
1259+
<tr><th scope="col">Initial Bytes</th><th scope="col">Result</th></tr>
1260+
<tr><td>EF BB BF 40 63 68 61 72 73 65 74 20 22 (XX)* 22 3B</td><td>as specified</td></tr>
1261+
<tr><td>EF BB BF</td><td>UTF-8</td></tr>
1262+
<tr><td>40 63 68 61 72 73 65 74 20 22 (XX)* 22 3B</td><td>as specified</td></tr>
1263+
<tr><td>FE FF 00 40 00 63 00 68 00 61 00 72 00 73 00 65 00 74 00 20 00 22 (00 XX)* 00 22 00 3B</td><td>as specified (with BE endianness if not specified)</td></tr>
1264+
<tr><td>00 40 00 63 00 68 00 61 00 72 00 73 00 65 00 74 00 20 00 22 (00 XX)* 00 22 00 3B</td><td>as specified (with BE endianness if not specified)</td></tr>
1265+
<tr><td>FF FE 40 00 63 00 68 00 61 00 72 00 73 00 65 00 74 00 20 00 22 00 (XX 00)* 22 00 3B 00</td><td>as specified (with LE endianness if not specified)</td></tr>
1266+
<tr><td>40 00 63 00 68 00 61 00 72 00 73 00 65 00 74 00 20 00 22 00 (XX 00)* 22 00 3B 00</td><td>as specified (with LE endianness if not specified)</td></tr>
1267+
<tr><td>00 00 FE FF 00 00 00 40 00 00 00 63 00 00 00 68 00 00 00 61 00 00 00 72 00 00 00 73 00 00 00 65 00 00 00 74 00 00 00 20 00 00 00 22 (00 00 00 XX)* 00 00 00 22 00 00 00 3B</td><td>as specified (with BE endianness if not specified)</td></tr>
1268+
<tr><td>00 00 00 40 00 00 00 63 00 00 00 68 00 00 00 61 00 00 00 72 00 00 00 73 00 00 00 65 00 00 00 74 00 00 00 20 00 00 00 22 (00 00 00 XX)* 00 00 00 22 00 00 00 3B</td><td>as specified (with BE endianness if not specified)</td></tr>
1269+
<tr><td>00 00 FF FE 00 00 40 00 00 00 63 00 00 00 68 00 00 00 61 00 00 00 72 00 00 00 73 00 00 00 65 00 00 00 74 00 00 00 20 00 00 00 22 00 (00 00 XX 00)* 00 00 22 00 00 00 3B 00</td><td>as specified (with 2143 endianness if not specified)</td></tr>
1270+
<tr><td>00 00 40 00 00 00 63 00 00 00 68 00 00 00 61 00 00 00 72 00 00 00 73 00 00 00 65 00 00 00 74 00 00 00 20 00 00 00 22 00 (00 00 XX 00)* 00 00 22 00 00 00 3B 00</td><td>as specified (with 2143 endianness if not specified)</td></tr>
1271+
<tr><td>FE FF 00 00 00 40 00 00 00 63 00 00 00 68 00 00 00 61 00 00 00 72 00 00 00 73 00 00 00 65 00 00 00 74 00 00 00 20 00 00 00 22 00 00 (00 XX 00 00)* 00 22 00 00 00 3B 00 00</td><td>as specified (with 3412 endianness if not specified)</td></tr>
1272+
<tr><td>00 40 00 00 00 63 00 00 00 68 00 00 00 61 00 00 00 72 00 00 00 73 00 00 00 65 00 00 00 74 00 00 00 20 00 00 00 22 00 00 (00 XX 00 00)* 00 22 00 00 00 3B 00 00</td><td>as specified (with 3412 endianness if not specified)</td></tr>
1273+
<tr><td>FF FE 00 00 40 00 00 00 63 00 00 00 68 00 00 00 61 00 00 00 72 00 00 00 73 00 00 00 65 00 00 00 74 00 00 00 20 00 00 00 22 00 00 00 (XX 00 00 00)* 22 00 00 00 3B 00 00 00</td><td>as specified (with LE endianness if not specified)</td></tr>
1274+
<tr><td>40 00 00 00 63 00 00 00 68 00 00 00 61 00 00 00 72 00 00 00 73 00 00 00 65 00 00 00 74 00 00 00 20 00 00 00 22 00 00 00 (XX 00 00 00)* 22 00 00 00 3B 00 00 00</td><td>as specified (with LE endianness if not specified)</td></tr>
1275+
<tr><td>00 00 FE FF</td><td>UTF-32-BE</td></tr>
1276+
<tr><td>FF FE 00 00</td><td>UTF-32-LE</td></tr>
1277+
<tr><td>00 00 FF FE</td><td>UTF-32-2143</td></tr>
1278+
<tr><td>FE FF 00 00</td><td>UTF-32-3412</td></tr>
1279+
<tr><td>FE FF</td><td>UTF-16-BE</td></tr>
1280+
<tr><td>FF FE</td><td>UTF-16-LE</td></tr>
1281+
<tr><td>7C 83 88 81 99 A2 85 A3 40 7F (YY)* 7F 5E</td><td>as specified, transcoded from EBCDIC to ASCII</td></tr>
1282+
<tr><td>AE 83 88 81 99 A2 85 A3 40 FC (YY)* FC 5E</td><td>as specified, transcoded from IBM1026 to ASCII</td></tr>
1283+
<tr><td>00 63 68 61 72 73 65 74 20 22 (YY)* 22 3B</td><td>as specified, transcoded from GSM 03.38 to ASCII</td></tr>
1284+
<tr><td>analogous patterns</td><td>User agents may
1285+
support additional, analogous, patterns if they support encodings
1286+
that are not handled by the patterns here</td></tr>
1287+
</table>
1288+
1289+
</li>
12491290

1250-
<!-- Encodings not to use? (cf. HTML 4.0) -IJ -->
1291+
<li>If the encoding is detected based on one of the entries in the table
1292+
above marked "as specified", the user agent ignores the stylesheet if it
1293+
does not parse an appropriate @charset rule at the beginning of the
1294+
stream of characters resulting from decoding in the chosen @charset.
1295+
This ensures that:
1296+
<ul>
1297+
<li>@charset rules should only function if they are in the
1298+
encoding of the stylesheet,</li>
1299+
<li>byte order marks are ignored only
1300+
in encodings that support a byte order mark, and</li>
1301+
<li>encoding names cannot contain newlines.</li>
1302+
</ul>
1303+
</li>
1304+
1305+
</ul>
12511306

12521307
<h3>Referring to characters not represented in a character encoding</h3>
12531308

0 commit comments

Comments
 (0)