Skip to content

Commit 9c5026f

Browse files
committed
[css2] Proposed text about BOMs
--HG-- extra : convert_revision : svn%3A73dc7c4b-06e6-40f3-b4f7-9ed1dbc14bfc/trunk%402194
1 parent ca206e1 commit 9c5026f

2 files changed

Lines changed: 62 additions & 17 deletions

File tree

css2/refs.src

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
22
<html lang="en">
3-
<!-- $Id: refs.src,v 2.22 2003-09-05 14:25:56 bbos Exp $ -->
3+
<!-- $Id: refs.src,v 2.23 2003-12-05 18:25:10 bbos Exp $ -->
44
<HEAD>
55
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
66
<TITLE>Bibliography</TITLE>
@@ -118,6 +118,10 @@ href="http://www.ietf.org/rfc/rfc2318.txt">http://www.ietf.org/rfc/rfc2318.txt</
118118
<dd>"Uniform Resource Locators", T. Berners-Lee, L. Masinter, and M. McCahill, December 1994.<BR>
119119
Available at <a href="http://www.ietf.org/rfc/rfc1738.txt">http://www.ietf.org/rfc/rfc1738.txt</a>.
120120

121+
<dt><strong><a name="ref-RFC2781" class="normref">[RFC2781]</a></strong>
122+
<dd>"UTF-16, an encoding of ISO 10646", P. Hoffman, F. Yergeau, February 2000<br>
123+
Available at <a href="http://www.ietf.org/rfc/rfc2781.txt">http://www.ietf.org/rfc/rfc2781.txt</a>.
124+
121125
<dt><strong><a name="ref-SRGB" class="normref">[SRGB]</a></strong>
122126
<dd>"Proposal for a Standard Color Space for the Internet - sRGB",
123127
M. Anderson, R. Motta, S. Chandrasekar, M. Stokes.<BR>

css2/syndata.src

Lines changed: 57 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
22
"http://www.w3.org/TR/1998/REC-html40-19980424/loose.dtd">
33
<html lang="en">
4-
<!-- $Id: syndata.src,v 2.90 2003-12-01 16:11:33 bbos Exp $ -->
4+
<!-- $Id: syndata.src,v 2.91 2003-12-05 18:25:10 bbos Exp $ -->
55
<head>
66
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
77
<title>Syntax and basic data types</title>
@@ -1191,29 +1191,70 @@ at-rule.</li>
11911191
<li>Mechanisms of the language of the
11921192
referencing document (e.g., in HTML, the "charset"
11931193
attribute of the LINK element).</li>
1194-
<li>UA-dependent mechanisms</li>
1194+
<li>UA-dependent mechanisms <ins>(e.g., guessing based on the <a
1195+
href="#BOM">BOM</a>)</ins></li>
11951196
</ol>
11961197

1197-
<p><ins cite="http://www.damowmow.com/temp/csswg/css21/issues"
1198-
title="issue 44">[A BOM character is ignored if it conflicts, otherwise used]</ins>
1198+
<del>
11991199

1200-
<p>At most one @charset rule may appear in an external
1201-
style sheet &mdash; it must <em>not</em> appear in an embedded style sheet
1202-
&mdash; and it must appear at the very start of the document, not preceded
1203-
by any characters. After "@charset", authors specify the name of a
1204-
character encoding. The name must be a charset name as described in
1205-
the IANA registry (See [[IANA]]. Also, see [[-CHARSETS]] for a complete
1206-
list of charsets). For example:
1207-
</p>
1200+
<p>At most one @charset rule may appear in an external style sheet
1201+
&mdash; it must <em>not</em> appear in an embedded style sheet &mdash;
1202+
and it must appear at the very start of the document, not preceded by
1203+
any characters.
12081204

1209-
<div class="example"><p>
1210-
@charset "ISO-8859-1";
1211-
</p></div>
1205+
</del>
1206+
<ins>
1207+
1208+
<p>At most one @charset rule may appear in an external style sheet and
1209+
it must appear at the very start of the document, not preceded by any
1210+
characters, except possibly a <a href="#BOM">"BOM" (see below)</a>.
1211+
Any other @charset rules must be ignored by the UA.
1212+
1213+
</ins>
1214+
1215+
<p>After "@charset", authors specify the name of a character encoding.
1216+
The name must be a charset name as described in the IANA registry (See
1217+
[[IANA]]. Also, see [[-CHARSETS]] for a complete list of charsets).
1218+
For example:
1219+
1220+
<pre class="example">@charset "ISO-8859-1";</pre>
12121221

12131222
<p>This specification does not mandate which character encodings
12141223
a user agent must support.
12151224
</p>
1216-
<p>Note that reliance on the @charset construct theoretically poses a
1225+
1226+
<ins cite="http://www.damowmow.com/temp/csswg/css21/issues"
1227+
title="issue 44">
1228+
1229+
<p id="BOM">If an external style sheet has U+FEFF ("zero width
1230+
non-breaking space") as the first character (i.e., even before any
1231+
@charset rule), this character is interpreted as a so-called "Byte
1232+
Order Mark" (BOM), as follows:
1233+
<ul>
1234+
<li>If the style sheet is encoded as "UTF-16" [[RFC2781]] or "UTF-32"
1235+
[[UNICODE]], the BOM determines the byte order ("big-endian" or
1236+
"little-endian") as explained in the cited RFC.
1237+
1238+
<li>If the style sheet is encoded as anything else, the U+FEFF
1239+
character is ignored.
1240+
</ul>
1241+
1242+
<p>An external style sheet <em>should</em> start with a BOM if it is
1243+
encoded as "UTF-16" or "UTF-32" and <em>should not</em> have a BOM in
1244+
any other encodings.
1245+
1246+
<p class="note">Note that the BOM can only be ignored if it agrees
1247+
with the encoding. E.g., if a style sheet encoded as "UTF-8" starts
1248+
with 0xEF 0xBB 0xBF those three bytes are ignored, since they
1249+
correctly encode the character U+FEFF in UTF-8. But if a style sheet
1250+
encoded as "ISO-8859-1" starts with the two bytes 0xFE 0xFF (the BOM
1251+
for big-endian UTF-16), the two bytes are simply interpreted as the
1252+
two characters "�" and "�".
1253+
1254+
</ins>
1255+
1256+
<p class="note">Note that reliance on the @charset construct
1257+
theoretically poses a
12171258
problem since there is no <em>a priori</em> information on how it is
12181259
encoded. In practice, however, the encodings in wide use on the
12191260
Internet are either based on ASCII, UTF-16, UCS-4, or (rarely) on

0 commit comments

Comments
 (0)