|
1 | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" |
2 | 2 | "http://www.w3.org/TR/1998/REC-html40-19980424/loose.dtd"> |
3 | 3 | <html lang="en"> |
4 | | -<!-- $Id: syndata.src,v 2.90 2003-12-01 16:11:33 bbos Exp $ --> |
| 4 | +<!-- $Id: syndata.src,v 2.91 2003-12-05 18:25:10 bbos Exp $ --> |
5 | 5 | <head> |
6 | 6 | <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> |
7 | 7 | <title>Syntax and basic data types</title> |
@@ -1191,29 +1191,70 @@ at-rule.</li> |
1191 | 1191 | <li>Mechanisms of the language of the |
1192 | 1192 | referencing document (e.g., in HTML, the "charset" |
1193 | 1193 | attribute of the LINK element).</li> |
1194 | | -<li>UA-dependent mechanisms</li> |
| 1194 | +<li>UA-dependent mechanisms <ins>(e.g., guessing based on the <a |
| 1195 | +href="#BOM">BOM</a>)</ins></li> |
1195 | 1196 | </ol> |
1196 | 1197 |
|
1197 | | -<p><ins cite="http://www.damowmow.com/temp/csswg/css21/issues" |
1198 | | -title="issue 44">[A BOM character is ignored if it conflicts, otherwise used]</ins> |
| 1198 | +<del> |
1199 | 1199 |
|
1200 | | -<p>At most one @charset rule may appear in an external |
1201 | | -style sheet — it must <em>not</em> appear in an embedded style sheet |
1202 | | -— and it must appear at the very start of the document, not preceded |
1203 | | -by any characters. After "@charset", authors specify the name of a |
1204 | | -character encoding. The name must be a charset name as described in |
1205 | | -the IANA registry (See [[IANA]]. Also, see [[-CHARSETS]] for a complete |
1206 | | -list of charsets). For example: |
1207 | | -</p> |
| 1200 | +<p>At most one @charset rule may appear in an external style sheet |
| 1201 | +— it must <em>not</em> appear in an embedded style sheet — |
| 1202 | +and it must appear at the very start of the document, not preceded by |
| 1203 | +any characters. |
1208 | 1204 |
|
1209 | | -<div class="example"><p> |
1210 | | -@charset "ISO-8859-1"; |
1211 | | -</p></div> |
| 1205 | +</del> |
| 1206 | +<ins> |
| 1207 | + |
| 1208 | +<p>At most one @charset rule may appear in an external style sheet and |
| 1209 | +it must appear at the very start of the document, not preceded by any |
| 1210 | +characters, except possibly a <a href="#BOM">"BOM" (see below)</a>. |
| 1211 | +Any other @charset rules must be ignored by the UA. |
| 1212 | + |
| 1213 | +</ins> |
| 1214 | + |
| 1215 | +<p>After "@charset", authors specify the name of a character encoding. |
| 1216 | +The name must be a charset name as described in the IANA registry (See |
| 1217 | +[[IANA]]. Also, see [[-CHARSETS]] for a complete list of charsets). |
| 1218 | +For example: |
| 1219 | + |
| 1220 | +<pre class="example">@charset "ISO-8859-1";</pre> |
1212 | 1221 |
|
1213 | 1222 | <p>This specification does not mandate which character encodings |
1214 | 1223 | a user agent must support. |
1215 | 1224 | </p> |
1216 | | -<p>Note that reliance on the @charset construct theoretically poses a |
| 1225 | + |
| 1226 | +<ins cite="http://www.damowmow.com/temp/csswg/css21/issues" |
| 1227 | +title="issue 44"> |
| 1228 | + |
| 1229 | +<p id="BOM">If an external style sheet has U+FEFF ("zero width |
| 1230 | +non-breaking space") as the first character (i.e., even before any |
| 1231 | +@charset rule), this character is interpreted as a so-called "Byte |
| 1232 | +Order Mark" (BOM), as follows: |
| 1233 | +<ul> |
| 1234 | +<li>If the style sheet is encoded as "UTF-16" [[RFC2781]] or "UTF-32" |
| 1235 | +[[UNICODE]], the BOM determines the byte order ("big-endian" or |
| 1236 | +"little-endian") as explained in the cited RFC. |
| 1237 | + |
| 1238 | +<li>If the style sheet is encoded as anything else, the U+FEFF |
| 1239 | +character is ignored. |
| 1240 | +</ul> |
| 1241 | + |
| 1242 | +<p>An external style sheet <em>should</em> start with a BOM if it is |
| 1243 | +encoded as "UTF-16" or "UTF-32" and <em>should not</em> have a BOM in |
| 1244 | +any other encodings. |
| 1245 | + |
| 1246 | +<p class="note">Note that the BOM can only be ignored if it agrees |
| 1247 | +with the encoding. E.g., if a style sheet encoded as "UTF-8" starts |
| 1248 | +with 0xEF 0xBB 0xBF those three bytes are ignored, since they |
| 1249 | +correctly encode the character U+FEFF in UTF-8. But if a style sheet |
| 1250 | +encoded as "ISO-8859-1" starts with the two bytes 0xFE 0xFF (the BOM |
| 1251 | +for big-endian UTF-16), the two bytes are simply interpreted as the |
| 1252 | +two characters "�" and "�". |
| 1253 | + |
| 1254 | +</ins> |
| 1255 | + |
| 1256 | +<p class="note">Note that reliance on the @charset construct |
| 1257 | +theoretically poses a |
1217 | 1258 | problem since there is no <em>a priori</em> information on how it is |
1218 | 1259 | encoded. In practice, however, the encodings in wide use on the |
1219 | 1260 | Internet are either based on ASCII, UTF-16, UCS-4, or (rarely) on |
|
0 commit comments