Skip to content

Commit 0d23531

Browse files
committed
[css-syntax] Restrict charset declaration bytes to 23-7F. Editorial rework of some of the charset stuff.
1 parent f7ffce5 commit 0d23531

2 files changed

Lines changed: 128 additions & 54 deletions

File tree

css-syntax/Overview.html

Lines changed: 67 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@
5454
</p>
5555
<h1 class="p-name no-ref" id=title>CSS Syntax Module Level 3</h1>
5656
<h2 class="no-num no-toc no-ref heading settled heading" id=subtitle><span class=content>Editor’s Draft,
57-
<span class=dt-updated><span class=value-title title=20140116>16 January 2014</span></span></span></h2>
57+
<span class=dt-updated><span class=value-title title=20140124>24 January 2014</span></span></span></h2>
5858
<div data-fill-with=spec-metadata><dl><dt>This version:<dd><a class=u-url href=http://dev.w3.org/csswg/css-syntax/>http://dev.w3.org/csswg/css-syntax/</a><dt>Latest version:<dd><a href=http://www.w3.org/TR/css-syntax-3/>http://www.w3.org/TR/css-syntax-3/</a><dt>Editor’s Draft:<dd><a href=http://dev.w3.org/csswg/css-syntax/>http://dev.w3.org/csswg/css-syntax/</a><dt>Previous Versions:<dd><a href=http://www.w3.org/TR/2013/WD-css-syntax-3-20131105/ rel=previous>http://www.w3.org/TR/2013/WD-css-syntax-3-20131105/</a><dd><a href=http://www.w3.org/TR/2013/WD-css-syntax-3-20130919/ rel=previous>http://www.w3.org/TR/2013/WD-css-syntax-3-20130919/</a>
5959
<dt>Feedback:</dt>
6060
<dd><a href="mailto:www-style@w3.org?subject=%5Bcss-syntax%5D%20feedback">www-style@w3.org</a>
@@ -423,17 +423,16 @@ <h3 class="heading settled heading" data-level=3.2 id=input-byte-stream><span cl
423423

424424
<p> When parsing a stylesheet,
425425
the stream of Unicode <a data-link-type=dfn href=#code-point title="code points">code points</a> that comprises the input to the tokenization stage
426-
may be initially seen by the user agent as a stream of bytes
426+
might be initially seen by the user agent as a stream of bytes
427427
(typically coming over the network or from the local file system).
428-
The bytes encode the <a data-link-type=dfn href=#code-point title="code points">code points</a> according to a particular character encoding,
429-
which the user agent must use to decode the bytes into <a data-link-type=dfn href=#code-point title="code points">code points</a>.
428+
If so, the user agent must decode these bytes into <a data-link-type=dfn href=#code-point title="code points">code points</a> according to a particular character encoding.
430429

431430
<p> To decode the stream of bytes into a stream of <a data-link-type=dfn href=#code-point title="code points">code points</a>,
432-
UAs must use the <a href=http://encoding.spec.whatwg.org/#decode>decode</a> algorithm
431+
UAs must use the <dfn data-dfn-type=dfn data-noexport="" id=decode><a href=http://encoding.spec.whatwg.org/#decode>decode</a><a class=self-link href=#decode></a></dfn> algorithm
433432
defined in <a data-biblio-type=normative data-link-type=biblio href=#encoding title=encoding>[ENCODING]</a>,
434433
with the fallback encoding determined as follows.
435434

436-
<p class=note> Note: The <a href=http://encoding.spec.whatwg.org/#decode>decode</a> algorithm
435+
<p class=note> Note: The <a data-link-type=dfn href=#decode title=decode>decode</a> algorithm
437436
gives precedence to a byte order mark (BOM),
438437
and only uses the fallback when none is found.
439438

@@ -442,39 +441,60 @@ <h3 class="heading settled heading" data-level=3.2 id=input-byte-stream><span cl
442441
<ol>
443442
<li>
444443
If HTTP or equivalent protocol defines an encoding (e.g. via the charset parameter of the Content-Type header),
445-
<a href=http://encoding.spec.whatwg.org/#concept-encoding-get>get an encoding</a> <a data-biblio-type=normative data-link-type=biblio href=#encoding title=encoding>[ENCODING]</a>
444+
<dfn data-dfn-type=dfn data-export="" id=get-an-encoding><a href=http://encoding.spec.whatwg.org/#concept-encoding-get>get an encoding</a><a class=self-link href=#get-an-encoding></a></dfn> <a data-biblio-type=normative data-link-type=biblio href=#encoding title=encoding>[ENCODING]</a>
446445
for the specified value.
447446
If that does not return failure,
448447
use the return value as the fallback encoding.
449448

450449
<li>
451-
Otherwise, check the byte stream. If the first several bytes match the hex sequence
450+
Otherwise, check the byte stream. If the byte stream begins with the hex sequence
452451

453-
<pre>40 63 68 61 72 73 65 74 20 22 (not 22)* 22 3B</pre>
454-
<p> then <a href=http://encoding.spec.whatwg.org/#concept-encoding-get>get an encoding</a> <a data-biblio-type=normative data-link-type=biblio href=#encoding title=encoding>[ENCODING]</a>
455-
for the sequence of <code>(not 22)*</code> bytes,
456-
decoded per <code>windows-1252</code>.
452+
<pre>40 63 68 61 72 73 65 74 20 22 XX* 22 3B</pre>
453+
<p> where each <code>XX</code> byte is between 23<sub>16</sub> and 7E<sub>16</sub> inclusive,
454+
then <a data-link-type=dfn href=#get-an-encoding title="get an encoding">get an encoding</a>
455+
for the sequence of <code>XX</code> bytes,
456+
interpreted as <code>ASCII</code>.
457457

458-
<p class=note> Note: Anything ASCII-compatible will do since valid labels are all ASCII,
459-
so using <code>windows-1252</code> is fine.
458+
</p><details class=why>
459+
<summary>What does that byte sequence mean?</summary>
460460

461+
<p> The byte sequence above,
462+
when decoded as ASCII,
463+
is the string "<code>@charset "…";</code>",
464+
where the "…" is the sequence of bytes corresponding to the encoding’s label.
465+
</details>
461466

462-
<p class=note> Note: The byte sequence above,
463-
when decoded as ASCII,
464-
is the string "<code>@charset "…";</code>",
465-
where the "…" is the sequence of bytes corresponding to the encoding’s label.
467+
<p> UAs may impose an arbitrary limit upon the number of <code>XX</code> bytes scanned,
468+
as long as it is large enough to encompass all of the <a href=http://encoding.spec.whatwg.org/#label>labels</a> defined in <a data-biblio-type=normative data-link-type=biblio href=#encoding title=encoding>[ENCODING]</a>;
469+
at the time of writing this specification these are all 19 or fewer bytes long.
466470

467471
<p> If the return value was <code>utf-16be</code> or <code>utf-16le</code>,
468472
use <code>utf-8</code> as the fallback encoding;
469473
if it was anything else except failure,
470474
use the return value as the fallback encoding.
471475

472-
<p class=note> Note: UTF-16BE and UTF-16LE are the only ASCII-incompatible encodings that
473-
<a href=http://encoding.spec.whatwg.org/#concept-encoding-get>get an encoding</a>
474-
can return.
475-
Using one of them to decode the ASCII <code>@charset "…";</code> byte sequence
476-
would result in garbage.
477-
This mimics HTML <code>&lt;meta&gt;</code> behavior.
476+
</p><details class=why>
477+
<summary>Why use utf-8 when the declaration says utf-16?</summary>
478+
479+
<p> The bytes of the encoding declaration spell out “<code>@charset "…";</code>” in ASCII,
480+
but UTF-16 is not ASCII-compatible.
481+
Either you’ve typed in complete gibberish (like <code>䁣桡牳整•utf-16be∻</code>) to get the right bytes in the document,
482+
which we don’t want to encourage,
483+
or your document is actually in an ASCII-compatible encoding
484+
and your encoding declaration is lying.
485+
486+
<p> Either way, defaulting to UTF-8 is a decent answer.
487+
488+
<p> As well, this mimics the behavior of HTML’s <code>&lt;meta charset&gt;</code> attribute.
489+
</details>
490+
491+
<p class=note> Note: Note that the syntax of an encoding declaration <em>looks like</em> the syntax of an <a class=css-code data-link-type=at-rule href=#at-ruledef-charset title=@charset>@charset</a> rule,
492+
but it’s actually much more restrictive.
493+
A number of things you can do in CSS that would produce a valid <a class=css-code data-link-type=at-rule href=#at-ruledef-charset title=@charset>@charset</a> rule,
494+
such as using multiple spaces, comments, or single quotes,
495+
will cause the encoding declaration to not be recognized.
496+
This behavior keeps the encoding declaration as simple as possible,
497+
and thus maximizes the likelihood of it being implemented correctly.
478498

479499
<li>
480500
Otherwise, if an <a data-link-type=dfn href=#environment-encoding0 title="environment encoding">environment encoding</a> is provided by the referring document,
@@ -484,6 +504,22 @@ <h3 class="heading settled heading" data-level=3.2 id=input-byte-stream><span cl
484504
Otherwise, use <code>utf-8</code> as the fallback encoding.
485505
</ol>
486506

507+
<div class=note>
508+
509+
<p> Though UTF-8 is the default encoding for the web,
510+
and many newer web-based file formats assume or require UTF-8 encoding,
511+
CSS was created before it was clear which encoding would win,
512+
and thus can’t automatically assume the stylesheet is UTF-8.
513+
514+
<p> Stylesheet authors <em>should</em> author their stylesheets in UTF-8,
515+
and ensure that either an HTTP header (or equivalent method) declares the encoding of the stylesheet to be UTF-8,
516+
or that the referring document declares its encoding to be UTF-8.
517+
(In HTML, this is done by adding a <code>&lt;meta charset=utf-8&gt;</code> element to the head of the document.)
518+
519+
<p> If neither of these options are available,
520+
authors should begin the stylesheet with a UTF-8 BOM
521+
or the exact characters <code>@charset "utf-8";</code>.
522+
</div>
487523

488524
<h3 class="heading settled heading" data-level=3.3 id=environment-encoding><span class=secno>3.3 </span><span class=content>
489525
Environment encoding</span><a class=self-link href=#environment-encoding></a></h3>
@@ -4484,7 +4520,7 @@ <h3 class="heading settled heading" data-level=8.1 id=style-rules><span class=se
44844520
but qualified rules inside <span class=css data-link-type=maybe title=@keyframes>@keyframes</span> rules are not <a data-biblio-type=informative data-link-type=biblio href=#css3-animations title=css3-animations>[CSS3-ANIMATIONS]</a>.
44854521

44864522
<h3 class="heading settled heading" data-level=8.2 id=charset-rule><span class=secno>8.2 </span><span class=content>
4487-
The <span class=css data-link-type=maybe title=@charset>@charset</span> Rule</span><a class=self-link href=#charset-rule></a></h3>
4523+
The <a class=css data-link-type=maybe href=#at-ruledef-charset title=@charset>@charset</a> Rule</span><a class=self-link href=#charset-rule></a></h3>
44884524

44894525
<p> The @charset rule is an artifact of the algorithm used to <a data-link-type=dfn href=#determine-the-fallback-encoding title="determine the fallback encoding">determine the fallback encoding</a> for the stylesheet.
44904526
That algorithm looks for a specific byte sequence as the very first few bytes in the file,
@@ -4494,7 +4530,7 @@ <h3 class="heading settled heading" data-level=8.2 id=charset-rule><span class=s
44944530

44954531
<p> Therefore, the stylesheet parser recognizes an @-rule with the general syntax
44964532

4497-
<pre class=prod><dfn class=css-code data-dfn-type=type data-export="" id=typedef-at-charset-rule>&lt;at-charset-rule&gt;<a class=self-link href=#typedef-at-charset-rule></a></dfn> = @charset <a class="production css-code" data-link-type=type href=http://dev.w3.org/csswg/css-values-3/#string-value title="<string>">&lt;string&gt;</a> ;</pre>
4533+
<pre class=prod><dfn class=css-code data-dfn-type=at-rule data-export="" id=at-ruledef-charset>@charset<a class=self-link href=#at-ruledef-charset></a></dfn> = @charset <a class="production css-code" data-link-type=type href=http://dev.w3.org/csswg/css-values-3/#string-value title="<string>">&lt;string&gt;</a> ;</pre>
44984534
<p> and, for backward compatibility, includes it in the object model for the stylesheet.
44994535
Modifying, adding, or removing an @charset rule via the object model has no effect
45004536
(in particular it does <strong>not</strong> cause the stylesheet to be rescanned in a different encoding).
@@ -4717,10 +4753,10 @@ <h3 class="heading settled heading" data-level=10.3 id=changes-css21><span class
47174753

47184754
<p> <ul>
47194755
<li>
4720-
Only detect <span class=css data-link-type=maybe title=@charset>@charset</span> rules in ASCII-compatible byte patterns.
4756+
Only detect <a class=css data-link-type=maybe href=#at-ruledef-charset title=@charset>@charset</a> rules in ASCII-compatible byte patterns.
47214757

47224758
<li>
4723-
Ignore <span class=css data-link-type=maybe title=@charset>@charset</span> rules that specify an ASCII-incompatible encoding,
4759+
Ignore <a class=css data-link-type=maybe href=#at-ruledef-charset title=@charset>@charset</a> rules that specify an ASCII-incompatible encoding,
47244760
as that would cause the rule itself to not decode properly.
47254761

47264762
<li>
@@ -4999,14 +5035,14 @@ <h2 class="no-num no-ref heading settled heading" id=index><span class=content>
49995035
<li>&lt;an+b&gt;, <a href=#anb-production title="section 6.2">6.2</a>
50005036
<li>are a valid escape, <a href=#check-if-two-code-points-are-a-valid-escape title="section 4.3.8">4.3.8</a>
50015037
<li>ASCII case-insensitive, <a href=#ascii-case-insensitive title="section 5.2">5.2</a>
5002-
<li>&lt;at-charset-rule&gt;, <a href=#typedef-at-charset-rule title="section 8.2">8.2</a>
50035038
<li>&lt;at-keyword-token&gt;, <a href=#typedef-at-keyword-token title="section 4">4</a>
50045039
<li>at-rule, <a href=#at-rule title="section 5">5</a>
50055040
<li>B, <a href=#b title="section 6">6</a>
50065041
<li>&lt;bad-string-token&gt;, <a href=#typedef-bad-string-token title="section 4">4</a>
50075042
<li>&lt;bad-url-token&gt;, <a href=#typedef-bad-url-token title="section 4">4</a>
50085043
<li>&lt;CDC-token&gt;, <a href=#typedef-cdc-token title="section 4">4</a>
50095044
<li>&lt;CDO-token&gt;, <a href=#typedef-cdo-token title="section 4">4</a>
5045+
<li>@charset, <a href=#at-ruledef-charset title="section 8.2">8.2</a>
50105046
<li>check if three code points would start an identifier, <a href=#check-if-three-code-points-would-start-an-identifier title="section 4.3.9">4.3.9</a>
50115047
<li>check if three code points would start a number, <a href=#check-if-three-code-points-would-start-a-number title="section 4.3.10">4.3.10</a>
50125048
<li>check if two code points are a valid escape, <a href=#check-if-two-code-points-are-a-valid-escape title="section 4.3.8">4.3.8</a>
@@ -5041,6 +5077,7 @@ <h2 class="no-num no-ref heading settled heading" id=index><span class=content>
50415077
<li>&lt;dashndashdigit-ident&gt;, <a href=#typedef-dashndashdigit-ident title="section 6.2">6.2</a>
50425078
<li>declaration, <a href=#declaration title="section 5">5</a>
50435079
<li>&lt;declaration-list&gt;, <a href=#typedef-declaration-list title="section 7.1">7.1</a>
5080+
<li>decode, <a href=#decode title="section 3.2">3.2</a>
50445081
<li>&lt;delim-token&gt;, <a href=#typedef-delim-token title="section 4">4</a>
50455082
<li>determine the fallback encoding, <a href=#determine-the-fallback-encoding title="section 3.2">3.2</a>
50465083
<li>digit, <a href=#digit title="section 4.2">4.2</a>
@@ -5053,6 +5090,7 @@ <h2 class="no-num no-ref heading settled heading" id=index><span class=content>
50535090
<li>escaping, <a href=#escaping0 title="section 2.1">2.1</a>
50545091
<li>function, <a href=#function title="section 5">5</a>
50555092
<li>&lt;function-token&gt;, <a href=#typedef-function-token title="section 4">4</a>
5093+
<li>get an encoding, <a href=#get-an-encoding title="section 3.2">3.2</a>
50565094
<li>&lt;hash-token&gt;, <a href=#typedef-hash-token title="section 4">4</a>
50575095
<li>hex digit, <a href=#hex-digit title="section 4.2">4.2</a>
50585096
<li>identifier, <a href=#identifier title="section 4.2">4.2</a>

0 commit comments

Comments
 (0)