11<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
22 "http://www.w3.org/TR/1998/REC-html40-19980424/loose.dtd">
33<html lang="en">
4- <!-- $Id: syndata.src,v 2.103 2004-03-08 18:40:14 bbos Exp $ -->
4+ <!-- $Id: syndata.src,v 2.104 2004-03-25 18:09:51 bbos Exp $ -->
55<head>
66<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
77<title>Syntax and basic data types</title>
@@ -264,6 +264,8 @@ href="#parsing-errors">rules for handling parsing errors</a>. However, because t
264264 is followed by at most six hexadecimal digits (0..9A..F), which
265265 stand for the ISO 10646 ([[ISO10646]])
266266 character with that number, which must not be zero.
267+ (It is undefined in CSS 2.1 what happens if a style sheet
268+ <em>does</em> contain a zero.)
267269 If a character in the range [0-9a-fA-F] follows the hexadecimal number,
268270 the end of the number needs to be made clear. There are two ways
269271 to do that:
@@ -1203,10 +1205,10 @@ encoding</span> (from highest priority to lowest):
12031205<li>Assume UTF-8</li>
12041206</ol>
12051207
1206- <p>At most one @charset rule may appear in an external style sheet and
1207- it must appear at the very start of the style sheet or immediately
1208- after a Byte Order Mark (BOM, U+FEFF) that is at the very start of the
1209- style sheet. Any other @charset rules must be ignored by the UA.
1208+ <p>Authors using an @charset rule must place the rule at the very
1209+ beginning of the style sheet, preceded by no characters. (If a byte
1210+ order mark is appropriate for the encoding used, it may precede the
1211+ @charset rule.)
12101212</p>
12111213
12121214<p>After "@charset", authors specify the name of a character encoding
@@ -1229,25 +1231,78 @@ registry.
12291231<p>This specification does not mandate which character encodings a
12301232user agent must support.
12311233
1232- <p>This specification does not specify what algorithm a UA must
1233- apply to derive the encoding from the BOM and the @charset. In
1234- particular, it does not specify the encoding to use if the BOM and the
1235- @charset conflict. This is expected to be defined in CSS3.
1234+ <p>User agents must ignore any @charset rule not at the beginning of the
1235+ style sheet. When user agents detect the character encoding using the
1236+ BOM and/or the @charset rule, they should follow the following rules:
12361237</p>
12371238
1238- <p class="note">Note that reliance on the @charset construct
1239- theoretically poses a
1240- problem since there is no <em>a priori</em> information on how it is
1241- encoded. In practice, however, the encodings in wide use on the
1242- Internet are either based on ASCII, UTF-16, UCS-4, or (rarely) on
1243- EBCDIC. This means that in general, the initial byte values of a
1244- style sheet enable a user agent to detect the encoding family reliably,
1245- which provides enough information to decode the @charset rule, which
1246- in turn determines the exact character encoding.
1247- </p>
1248- <!-- More examples of good encodings to use? -IJ -->
1239+ <ul>
1240+
1241+ <li>Except as specified in these rules, all @charset rules are ignored.</li>
1242+
1243+ <li>The encoding is detected based on the stream of bytes that begins
1244+ the stylesheet. The following table gives a set of possibilities for
1245+ initial byte sequences (written in hexadecimal). The first row that
1246+ matches the beginning of the stylesheet gives the result of encoding
1247+ detection based on the BOM and/or @charset rule. If no rows match, the
1248+ encoding cannot be detected based on the BOM and/or @charset rule. The
1249+ notation (...)* refers to repetition for which the best match is the one
1250+ that repeats as few times as possible. The bytes marked "XX" are those
1251+ used to determine the name of the encoding, by treating them, in the
1252+ order given, as a sequence of ASCII characters. Bytes marked "YY" are
1253+ similar, but need to be transcoded into ASCII as noted. User agents may
1254+ ignore entries in the table if they do not support any encodings
1255+ relevant to the entry.
1256+
1257+ <table border="1"
1258+ summary="Relationship between initial bytes of sheet and chosen encoding">
1259+ <tr><th scope="col">Initial Bytes</th><th scope="col">Result</th></tr>
1260+ <tr><td>EF BB BF 40 63 68 61 72 73 65 74 20 22 (XX)* 22 3B</td><td>as specified</td></tr>
1261+ <tr><td>EF BB BF</td><td>UTF-8</td></tr>
1262+ <tr><td>40 63 68 61 72 73 65 74 20 22 (XX)* 22 3B</td><td>as specified</td></tr>
1263+ <tr><td>FE FF 00 40 00 63 00 68 00 61 00 72 00 73 00 65 00 74 00 20 00 22 (00 XX)* 00 22 00 3B</td><td>as specified (with BE endianness if not specified)</td></tr>
1264+ <tr><td>00 40 00 63 00 68 00 61 00 72 00 73 00 65 00 74 00 20 00 22 (00 XX)* 00 22 00 3B</td><td>as specified (with BE endianness if not specified)</td></tr>
1265+ <tr><td>FF FE 40 00 63 00 68 00 61 00 72 00 73 00 65 00 74 00 20 00 22 00 (XX 00)* 22 00 3B 00</td><td>as specified (with LE endianness if not specified)</td></tr>
1266+ <tr><td>40 00 63 00 68 00 61 00 72 00 73 00 65 00 74 00 20 00 22 00 (XX 00)* 22 00 3B 00</td><td>as specified (with LE endianness if not specified)</td></tr>
1267+ <tr><td>00 00 FE FF 00 00 00 40 00 00 00 63 00 00 00 68 00 00 00 61 00 00 00 72 00 00 00 73 00 00 00 65 00 00 00 74 00 00 00 20 00 00 00 22 (00 00 00 XX)* 00 00 00 22 00 00 00 3B</td><td>as specified (with BE endianness if not specified)</td></tr>
1268+ <tr><td>00 00 00 40 00 00 00 63 00 00 00 68 00 00 00 61 00 00 00 72 00 00 00 73 00 00 00 65 00 00 00 74 00 00 00 20 00 00 00 22 (00 00 00 XX)* 00 00 00 22 00 00 00 3B</td><td>as specified (with BE endianness if not specified)</td></tr>
1269+ <tr><td>00 00 FF FE 00 00 40 00 00 00 63 00 00 00 68 00 00 00 61 00 00 00 72 00 00 00 73 00 00 00 65 00 00 00 74 00 00 00 20 00 00 00 22 00 (00 00 XX 00)* 00 00 22 00 00 00 3B 00</td><td>as specified (with 2143 endianness if not specified)</td></tr>
1270+ <tr><td>00 00 40 00 00 00 63 00 00 00 68 00 00 00 61 00 00 00 72 00 00 00 73 00 00 00 65 00 00 00 74 00 00 00 20 00 00 00 22 00 (00 00 XX 00)* 00 00 22 00 00 00 3B 00</td><td>as specified (with 2143 endianness if not specified)</td></tr>
1271+ <tr><td>FE FF 00 00 00 40 00 00 00 63 00 00 00 68 00 00 00 61 00 00 00 72 00 00 00 73 00 00 00 65 00 00 00 74 00 00 00 20 00 00 00 22 00 00 (00 XX 00 00)* 00 22 00 00 00 3B 00 00</td><td>as specified (with 3412 endianness if not specified)</td></tr>
1272+ <tr><td>00 40 00 00 00 63 00 00 00 68 00 00 00 61 00 00 00 72 00 00 00 73 00 00 00 65 00 00 00 74 00 00 00 20 00 00 00 22 00 00 (00 XX 00 00)* 00 22 00 00 00 3B 00 00</td><td>as specified (with 3412 endianness if not specified)</td></tr>
1273+ <tr><td>FF FE 00 00 40 00 00 00 63 00 00 00 68 00 00 00 61 00 00 00 72 00 00 00 73 00 00 00 65 00 00 00 74 00 00 00 20 00 00 00 22 00 00 00 (XX 00 00 00)* 22 00 00 00 3B 00 00 00</td><td>as specified (with LE endianness if not specified)</td></tr>
1274+ <tr><td>40 00 00 00 63 00 00 00 68 00 00 00 61 00 00 00 72 00 00 00 73 00 00 00 65 00 00 00 74 00 00 00 20 00 00 00 22 00 00 00 (XX 00 00 00)* 22 00 00 00 3B 00 00 00</td><td>as specified (with LE endianness if not specified)</td></tr>
1275+ <tr><td>00 00 FE FF</td><td>UTF-32-BE</td></tr>
1276+ <tr><td>FF FE 00 00</td><td>UTF-32-LE</td></tr>
1277+ <tr><td>00 00 FF FE</td><td>UTF-32-2143</td></tr>
1278+ <tr><td>FE FF 00 00</td><td>UTF-32-3412</td></tr>
1279+ <tr><td>FE FF</td><td>UTF-16-BE</td></tr>
1280+ <tr><td>FF FE</td><td>UTF-16-LE</td></tr>
1281+ <tr><td>7C 83 88 81 99 A2 85 A3 40 7F (YY)* 7F 5E</td><td>as specified, transcoded from EBCDIC to ASCII</td></tr>
1282+ <tr><td>AE 83 88 81 99 A2 85 A3 40 FC (YY)* FC 5E</td><td>as specified, transcoded from IBM1026 to ASCII</td></tr>
1283+ <tr><td>00 63 68 61 72 73 65 74 20 22 (YY)* 22 3B</td><td>as specified, transcoded from GSM 03.38 to ASCII</td></tr>
1284+ <tr><td>analogous patterns</td><td>User agents may
1285+ support additional, analogous, patterns if they support encodings
1286+ that are not handled by the patterns here</td></tr>
1287+ </table>
1288+
1289+ </li>
12491290
1250- <!-- Encodings not to use? (cf. HTML 4.0) -IJ -->
1291+ <li>If the encoding is detected based on one of the entries in the table
1292+ above marked "as specified", the user agent ignores the stylesheet if it
1293+ does not parse an appropriate @charset rule at the beginning of the
1294+ stream of characters resulting from decoding in the chosen @charset.
1295+ This ensures that:
1296+ <ul>
1297+ <li>@charset rules should only function if they are in the
1298+ encoding of the stylesheet,</li>
1299+ <li>byte order marks are ignored only
1300+ in encodings that support a byte order mark, and</li>
1301+ <li>encoding names cannot contain newlines.</li>
1302+ </ul>
1303+ </li>
1304+
1305+ </ul>
12511306
12521307<h3>Referring to characters not represented in a character encoding</h3>
12531308
0 commit comments