Skip to content

Commit 849eb6a

Browse files
committed
[css2] Shortened text on BOM, removed all hint at priority of BOM and
@charset, removed confusing explanation of BOM and when it may occur, added explicit mention of UTF-8. --HG-- extra : convert_revision : svn%3A73dc7c4b-06e6-40f3-b4f7-9ed1dbc14bfc/trunk%402226
1 parent c63f017 commit 849eb6a

1 file changed

Lines changed: 13 additions & 40 deletions

File tree

css2/syndata.src

Lines changed: 13 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
22
"http://www.w3.org/TR/1998/REC-html40-19980424/loose.dtd">
33
<html lang="en">
4-
<!-- $Id: syndata.src,v 2.98 2004-02-20 22:46:24 bbos Exp $ -->
4+
<!-- $Id: syndata.src,v 2.99 2004-02-23 11:14:24 bbos Exp $ -->
55
<head>
66
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
77
<title>Syntax and basic data types</title>
@@ -1177,7 +1177,7 @@ display declaration.
11771177
Character Set (see [[ISO10646]]). For transmission and
11781178
storage, these characters must be <span class="index-def"
11791179
title="character encoding">encoded</span> by a character encoding that
1180-
supports the set of characters available in US-ASCII (e.g., ISO
1180+
supports the set of characters available in US-ASCII (e.g., UTF-8, ISO
11811181
8859-x, SHIFT JIS, etc.). For a good introduction to character sets
11821182
and character encodings, please consult the HTML 4.0
11831183
specification ([[-HTML40]], chapter 5), See also the XML 1.0
@@ -1195,19 +1195,17 @@ encoding::default|default::character encoding">character
11951195
encoding</span> (from highest priority to lowest):
11961196
</p>
11971197
<ol>
1198-
<li>An HTTP "charset" parameter in a "Content-Type" field</li>
1199-
<li><a
1200-
href="#BOM">BOM</a></li>
1201-
<li>The <span class="index-def" title="@charset">@charset</span>
1202-
at-rule</li>
1198+
<li>An HTTP "charset" parameter in a "Content-Type" field
1199+
(or similar parameters in other protocols)</li>
1200+
<li>BOM and/or @charset (see below)</li>
12031201
<li><code>&lt;link charset=""&gt;</code> or other metadata from the linking mechanism (if any)</li>
1204-
<li>character encoding of referring stylesheet or document (if any)</li>
1205-
<li>UA-dependent mechanisms</li>
1202+
<li>charset of referring stylesheet or document (if any)</li>
1203+
<li>Assume UTF-8</li>
12061204
</ol>
12071205

12081206
<p>At most one @charset rule may appear in an external style sheet and
12091207
it must appear at the very start of the style sheet, not preceded by any
1210-
characters, except possibly a <a href="#BOM">"BOM" (see below)</a>.
1208+
characters, except possibly a "BOM" (U+FEFF).
12111209
Any other @charset rules must be ignored by the UA.
12121210
</p>
12131211

@@ -1219,40 +1217,15 @@ For example:
12191217

12201218
<pre class="example">@charset "ISO-8859-1";</pre>
12211219

1220+
<p>This specification does not specify what algorithm a UA must
1221+
apply to derive the encoding from the BOM and the @charset. In
1222+
particular, it does not specify the encoding to use if the BOM and the
1223+
@charset conflict. This is expected to be defined in CSS3.</p>
1224+
12221225
<p>This specification does not mandate which character encodings
12231226
a user agent must support.
12241227
</p>
12251228

1226-
<p id="BOM">If an external style sheet has U+FEFF ("zero width
1227-
non-breaking space") as the first character (i.e., even before any
1228-
@charset rule), this character is interpreted as a so-called "Byte
1229-
Order Mark" (BOM), as follows:
1230-
</p>
1231-
1232-
<ul>
1233-
<li>If the style sheet is encoded as "UTF-16" [[RFC2781]] or "UTF-32"
1234-
[[UNICODE]], the BOM determines the byte order (e.g. "big-endian" or
1235-
"little-endian") as explained in the cited RFC.
1236-
</li>
1237-
<li>If the style sheet is encoded as anything else, the U+FEFF
1238-
character is ignored.
1239-
</li>
1240-
</ul>
1241-
1242-
<p>An external style sheet <em>should</em> start with a BOM if it is
1243-
encoded as "UTF-16" or "UTF-32" and <em>should not</em> have a BOM in
1244-
any other encodings.
1245-
</p>
1246-
1247-
<p class="note">Note that the BOM can only be ignored if it agrees
1248-
with the encoding. E.g., if a style sheet encoded as "UTF-8" starts
1249-
with 0xEF 0xBB 0xBF those three bytes are ignored, since they
1250-
correctly encode the character U+FEFF in UTF-8. But if a style sheet
1251-
encoded as "ISO-8859-1" starts with the two bytes 0xFE 0xFF (the BOM
1252-
for big-endian UTF-16), the two bytes are simply interpreted as the
1253-
two characters "&#254;" and "&#255;".
1254-
</p>
1255-
12561229
<p class="note">Note that reliance on the @charset construct
12571230
theoretically poses a
12581231
problem since there is no <em>a priori</em> information on how it is

0 commit comments

Comments
 (0)