11<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
22 "http://www.w3.org/TR/1998/REC-html40-19980424/loose.dtd">
33<html lang="en">
4- <!-- $Id: syndata.src,v 2.98 2004-02-20 22:46 :24 bbos Exp $ -->
4+ <!-- $Id: syndata.src,v 2.99 2004-02-23 11:14 :24 bbos Exp $ -->
55<head>
66<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
77<title>Syntax and basic data types</title>
@@ -1177,7 +1177,7 @@ display declaration.
11771177Character Set (see [[ISO10646]]). For transmission and
11781178storage, these characters must be <span class="index-def"
11791179title="character encoding">encoded</span> by a character encoding that
1180- supports the set of characters available in US-ASCII (e.g., ISO
1180+ supports the set of characters available in US-ASCII (e.g., UTF-8, ISO
118111818859-x, SHIFT JIS, etc.). For a good introduction to character sets
11821182and character encodings, please consult the HTML 4.0
11831183specification ([[-HTML40]], chapter 5), See also the XML 1.0
@@ -1195,19 +1195,17 @@ encoding::default|default::character encoding">character
11951195encoding</span> (from highest priority to lowest):
11961196</p>
11971197<ol>
1198- <li>An HTTP "charset" parameter in a "Content-Type" field</li>
1199- <li><a
1200- href="#BOM">BOM</a></li>
1201- <li>The <span class="index-def" title="@charset">@charset</span>
1202- at-rule</li>
1198+ <li>An HTTP "charset" parameter in a "Content-Type" field
1199+ (or similar parameters in other protocols)</li>
1200+ <li>BOM and/or @charset (see below)</li>
12031201<li><code><link charset=""></code> or other metadata from the linking mechanism (if any)</li>
1204- <li>character encoding of referring stylesheet or document (if any)</li>
1205- <li>UA-dependent mechanisms </li>
1202+ <li>charset of referring stylesheet or document (if any)</li>
1203+ <li>Assume UTF-8 </li>
12061204</ol>
12071205
12081206<p>At most one @charset rule may appear in an external style sheet and
12091207it must appear at the very start of the style sheet, not preceded by any
1210- characters, except possibly a <a href="# BOM">"BOM" (see below)</a> .
1208+ characters, except possibly a " BOM" (U+FEFF) .
12111209Any other @charset rules must be ignored by the UA.
12121210</p>
12131211
@@ -1219,40 +1217,15 @@ For example:
12191217
12201218<pre class="example">@charset "ISO-8859-1";</pre>
12211219
1220+ <p>This specification does not specify what algorithm a UA must
1221+ apply to derive the encoding from the BOM and the @charset. In
1222+ particular, it does not specify the encoding to use if the BOM and the
1223+ @charset conflict. This is expected to be defined in CSS3.</p>
1224+
12221225<p>This specification does not mandate which character encodings
12231226a user agent must support.
12241227</p>
12251228
1226- <p id="BOM">If an external style sheet has U+FEFF ("zero width
1227- non-breaking space") as the first character (i.e., even before any
1228- @charset rule), this character is interpreted as a so-called "Byte
1229- Order Mark" (BOM), as follows:
1230- </p>
1231-
1232- <ul>
1233- <li>If the style sheet is encoded as "UTF-16" [[RFC2781]] or "UTF-32"
1234- [[UNICODE]], the BOM determines the byte order (e.g. "big-endian" or
1235- "little-endian") as explained in the cited RFC.
1236- </li>
1237- <li>If the style sheet is encoded as anything else, the U+FEFF
1238- character is ignored.
1239- </li>
1240- </ul>
1241-
1242- <p>An external style sheet <em>should</em> start with a BOM if it is
1243- encoded as "UTF-16" or "UTF-32" and <em>should not</em> have a BOM in
1244- any other encodings.
1245- </p>
1246-
1247- <p class="note">Note that the BOM can only be ignored if it agrees
1248- with the encoding. E.g., if a style sheet encoded as "UTF-8" starts
1249- with 0xEF 0xBB 0xBF those three bytes are ignored, since they
1250- correctly encode the character U+FEFF in UTF-8. But if a style sheet
1251- encoded as "ISO-8859-1" starts with the two bytes 0xFE 0xFF (the BOM
1252- for big-endian UTF-16), the two bytes are simply interpreted as the
1253- two characters "þ" and "ÿ".
1254- </p>
1255-
12561229<p class="note">Note that reliance on the @charset construct
12571230theoretically poses a
12581231problem since there is no <em>a priori</em> information on how it is
0 commit comments