Skip to content

Commit 5c4f659

Browse files
committed
[css2] Corrected the BOM errata. (BOM only overrides if the encoding is
already known to be UTF-based.) And added a note mentioning that a BOM in UTF-16BE, UTF-16LE, UTF-32BE or UTF-32LE is an error in Unicode. --HG-- extra : convert_revision : svn%3A73dc7c4b-06e6-40f3-b4f7-9ed1dbc14bfc/trunk%403274
1 parent 8983877 commit 5c4f659

1 file changed

Lines changed: 23 additions & 10 deletions

File tree

css2/syndata.src

Lines changed: 23 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
22
<html lang="en">
3-
<!-- $Id: syndata.src,v 2.193 2013-05-02 13:02:40 bbos Exp $ -->
3+
<!-- $Id: syndata.src,v 2.194 2013-05-02 14:01:28 bbos Exp $ -->
44
<head>
55
<title>Syntax and basic data types</title>
66
<!--script src="http://www.w3c-test.org/css/harness/annotate.js#CSS21_DEV" type="text/javascript" defer></script-->
@@ -1409,23 +1409,15 @@ encoding::default|default::character encoding">character
14091409
encoding</span> (from highest priority to lowest):
14101410
</p>
14111411
<ol>
1412-
<li><span class="index-inst">BOM</span>
14131412
<li>An HTTP "charset" parameter in a "Content-Type" field
14141413
(or similar parameters in other protocols)</li>
1415-
<li><span
1414+
<li><span class="index-inst">BOM</span> and/or <span
14161415
class="index-inst">@charset</span> (see below)</li>
14171416
<li><code>&lt;link charset=""&gt;</code> or other metadata from the linking mechanism (if any)</li>
14181417
<li>charset of referring style sheet or document (if any)</li>
14191418
<li>Assume UTF-8</li>
14201419
</ol>
14211420

1422-
<p class=note>Note that it is not possible to use a 1-byte character
1423-
encoding and start the CSS file with the characters 255 and 254 in
1424-
either order, because the two characters will be interpreted as a
1425-
BOM. E.g., "&#255;" and "&#254;" in ISO-8859-1, "&#729;" and "&#355;"
1426-
in ISO-8859-2, etc. Authors should start such files with something
1427-
else, e.g., a space.
1428-
14291421
<p>Authors using an <span class="index-inst">@charset</span> rule must
14301422
place the rule at the very beginning of the style sheet, preceded by
14311423
no characters. (If a byte order mark is appropriate for the encoding
@@ -1452,6 +1444,27 @@ registry.
14521444
<p>User agents must support at least the <span
14531445
class="index-inst">UTF-8</span> encoding.
14541446
</p>
1447+
1448+
<p>If rule 1 above (an HTTP "charset" parameter or similar) yields a
1449+
character encoding and it is one of UTF-8, UTF-16, UTF-16BE, UTF-16LE,
1450+
UTF-32, UTF-32BE or UTF-32LE, then a BOM, if any, at the start of the
1451+
file overrides that character encoding, as follows:
1452+
1453+
<table>
1454+
<thead>
1455+
<th><th>First bytes (hexadecimal) <th>Resulting encoding
1456+
<tbody>
1457+
<tr><td>00 00 FE FF <td>UTF-32, big-endian
1458+
<tr><td>FF FE 00 00 <td>UTF-32, little-endian
1459+
<tr><td>FE FF <td>UTF-16, big-endian
1460+
<tr><td>FF FE <td>UTF-16, little-endian
1461+
<tr><td>EF BB BF <td>UTF-8
1462+
</table>
1463+
1464+
<p class=note>Note that, if rule 1 yields UTF-16BE, UTF-16LE, UTF-32BE
1465+
or UTF-32LE, a BOM at the start of the file is an error. (Unicode
1466+
forbids a BOM in such files).
1467+
14551468
<p>User agents must ignore any @charset rule not at the beginning of the
14561469
style sheet. When user agents detect the character encoding using the
14571470
BOM and/or the @charset rule, they should follow the following rules:

0 commit comments

Comments
 (0)