Title: CSS Syntax Module Level 3 Shortname: css-syntax Level: 3 Status: ED Work Status: Testing Group: csswg ED: https://drafts.csswg.org/css-syntax/ TR: https://www.w3.org/TR/css-syntax-3/ Previous Version: https://www.w3.org/TR/2014/CR-css-syntax-3-20140220/ Previous Version: https://www.w3.org/TR/2013/WD-css-syntax-3-20131105/ Previous Version: https://www.w3.org/TR/2013/WD-css-syntax-3-20130919/ Editor: Tab Atkins Jr., Google, http://xanthir.com/contact/, w3cid 42199 Editor: Simon Sapin, Mozilla, http://exyr.org/about/, w3cid 58001 Abstract: This module describes, in general terms, the basic structure and syntax of CSS stylesheets. It defines, in detail, the syntax and parsing of CSS - how to turn a stream of bytes into a meaningful stylesheet. Ignored Terms:, , , , , Ignored Vars: +b, -b, foo
spec:css-text-decor-3; type:property; text:text-decoration spec:css-color-3; type:property; text:color spec:css-transforms-1; type:function; text:translatex() spec:infra; type:dfn; text:string
style
attribute).
It defines algorithms for converting a stream of Unicode code points
(in other words, text)
into a stream of CSS tokens,
and then further into CSS objects
such as stylesheets, rules, and declarations.
p > a { color: blue; text-decoration: underline; }In the above rule, "
p > a
" is the selector,
which, if the source document is HTML,
selects any <{a}> elements that are children of a <{p}> element.
"color: blue
" is a declaration specifying that,
for the elements that match the selector,
their 'color' property should have the value ''blue''.
Similarly, their 'text-decoration' property should have the value ''underline''.
@import "my-styles.css";The ''@import'' at-rule is a simple statement. After its name, it takes a single string or ''url()'' function to indicate the stylesheet that it should import.
@page :left { margin-left: 4cm; margin-right: 3cm; }The ''@page'' at-rule consists of an optional page selector (the '':left'' pseudoclass), followed by a block of properties that apply to the page when printed. In this way, it's very similar to a normal style rule, except that its properties don't apply to any "element", but rather the page itself.
@media print { body { font-size: 10pt } }The ''@media'' at-rule begins with a media type and a list of optional media queries. Its block contains entire rules, which are only applied when the ''@media''s conditions are fulfilled.
An identifier with the value "&B" could be written as ''\26 B'' or ''\000026B''.
A "real" space after the escape sequence must be doubled.
40 63 68 61 72 73 65 74 20 22 XX* 22 3Bwhere each
XX
byte is a value between 016 and 2116 inclusive
or a value between 2316 and 7F16 inclusive,
then get an encoding
for the sequence of XX
bytes,
interpreted as ASCII
.
@charset "…";
",
where the "…" is the sequence of bytes corresponding to the encoding's label.
utf-16be
or utf-16le
,
use utf-8
as the fallback encoding;
if it was anything else except failure,
use the return value as the fallback encoding.
@charset "…";
” in ASCII,
but UTF-16 is not ASCII-compatible.
Either you've typed in complete gibberish (like 䁣桡牳整•utf-16be∻
) to get the right bytes in the document,
which we don't want to encourage,
or your document is actually in an ASCII-compatible encoding
and your encoding declaration is lying.
Either way, defaulting to UTF-8 is a decent answer.
As well, this mimics the behavior of HTML's <meta charset>
attribute.
utf-8
as the fallback encoding.
<meta charset=utf-8>
element to the head of the document.)
If neither of these options are available,
authors should begin the stylesheet with a UTF-8 BOM
or the exact characters
@charset "utf-8";
<link rel=stylesheet>
.
Note: [[CSSOM]] defines the environment encoding for <xml-stylesheet?>
.
Note: [[CSS3CASCADE]] defines the environment encoding for @import
.
T: /* Star: N: anything but * followed by / T: */
Choice: T: \n T: \r\n T: \r T: \f
Choice: T: space T: \t N: newline
N: 0-9 a-f or A-F
T: \ Choice: N: not newline or hex digit Seq: Plus: N: hex digit C: 1-6 times Opt: skip N: whitespace
Plus: N: whitespace
Star: N:
Or: 1 T: -- Seq: Opt: skip T: - Or: N: a-z A-Z _ or non-ASCII N: escape Star: Or: N: a-z A-Z 0-9 _ - or non-ASCII N: escape
N:T: (
T: @ N:
T: # Plus: Choice: N:a-z A-Z 0-9 _ - or non-ASCII N: escape
Choice: Seq: T: " Star: Choice: N: not " \ or newline N: escape Seq: T: \ N: newline T: " Seq: T: ' Star: Choice: N: not ' \ or newline N: escape Seq: T: \ N: newline T: '
N:T: ( N: ws* Star: Choice: N: not " ' ( ) \ ws or non-printable N: escape N: ws* T: )
Choice: 1 T: + Skip: T: - Choice: Seq: Plus: N: digit T: . Plus: N: digit Plus: N: digit Seq: T: . Plus: N: digit Opt: skip Seq: Choice: T: e T: E Choice: 1 T: + S: T: - Plus: N: digit
N:N:
N:T: %
T: <!--
T: -->
s·(i + f·10-d)·10te
.
Star: Choice: 3 N:N: N: N: Qualified rule N: At-rule
Star: Choice: 1 N:N: Qualified rule N: At-rule
N:Star: N: Component value Choice: N: {} block T: ;
Star: N: Component value N: {} block
N: ws* Choice: Seq: Opt: N: Declaration Opt: Seq: T: ; N: Declaration list Seq: N: At-rule N: Declaration list
N:N: ws* T: : Star: N: Component value Opt: skip N: !important
T: ! N: ws* N:N: ws*
Choice: N: Preserved token N: {} block N: () block N: [] block N: Function block
T: { Star: N: Component value T: }
T: ( Star: N: Component value T: )
T: [ Star: N: Component value T: ]
N:Star: N: Component value T: )
CSSStyleSheet#insertRule
method,
and similar functions which might exist,
which parse text into a single rule.
style
attribute,
which parses text into the contents of a single style rule.
media
HTML attribute.
Examples:
2n+0 /* represents all of the even elements in the list */ even /* same */ 4n+1 /* represents the 1st, 5th, 9th, 13th, etc. elements in the list */
Example:
-1n+6 /* represents the first 6 elements of the list */ -4n+10 /* represents the 2nd, 6th, and 10th elements of the list */
Examples:
0n+5 /* represents the 5th element in the list */ 5 /* same */
1
may be omitted from the rule.
Examples:
The following notations are therefore equivalent:
1n+0 /* represents all elements in the list */ n+0 /* same */ n /* same */
Examples:
2n+0 /* represents every even element in the list */ 2n /* same */
Valid example:
3n-6
Invalid example:
3n + -6
Valid Examples with white space:
3n + 1 +3n - 2 -n+ 6 +6
Invalid Examples with white space:
3 n + 2n + 2
<an+b>
type<an+b> = odd | even | <integer> | <n-dimension> | '+'?† n | -n | <ndashdigit-dimension> | '+'?† <ndashdigit-ident> | <dashndashdigit-ident> | <n-dimension> <signed-integer> | '+'?† n <signed-integer> | -n <signed-integer> | <ndash-dimension> <signless-integer> | '+'?† n- <signless-integer> | -n- <signless-integer> | <n-dimension> ['+' | '-'] <signless-integer> '+'?† n ['+' | '-'] <signless-integer> | -n ['+' | '-'] <signless-integer>where:
<n-dimension>
is a <<ndash-dimension>
is a <<ndashdigit-dimension>
is a <<ndashdigit-ident>
is an <<dashndashdigit-ident>
is an <<integer>
is a <<signed-integer>
is a <<signless-integer>
is a <†: When a plus sign (+) precedes an ident starting with "n", as in the cases marked above, there must be no whitespace between the two tokens, or else the tokens do not match the above grammar. Whitespace is valid (and ignored) between any other two tokens. The clauses of the production are interpreted as follows:
<integer>
<n-dimension>
'+'? n
-n
<ndashdigit-dimension>
'+'? <ndashdigit-ident>
<dashndashdigit-ident>
<n-dimension> <signed-integer>
'+'? n <signed-integer>
-n <signed-integer>
<ndash-dimension> <signless-integer>
'+'? n- <signless-integer>
-n- <signless-integer>
<n-dimension> ['+' | '-'] <signless-integer>
'+'? n ['+' | '-'] <signless-integer>
-n ['+' | '-'] <signless-integer>
'-'
was provided between the two, B is instead the negation of the integer’s value.
a
element
following a u
element
should be colored green.
Whitespace is not normally required between combinators
and the surrounding selectors,
so it should be equivalent to minify it to
<In this production, no whitespace can occur between any of the tokens. The <> = u '+' < > '?'* | u < > '?'* | u < > '?'* | u < > < > | u < > < > | u '+' '?'+
<foo>
refers to the "foo" grammar term,
assumed to be defined elsewhere.
Substituting the <foo>
for its definition results in a semantically identical grammar.
Several types of tokens are written literally, without quotes:
auto
, disc
, etc), which are simply written as their value.
@media
.
translate(
.
:
), <,
), <;
), <(-token>, <)-token>, <{-token>, and <}-token>s.
Although it is possible, with escaping,
to construct an <
For example, qualified rules inside ''@media'' rules [[CSS3-CONDITIONAL]] are style rules,
but qualified rules inside ''@keyframes'' rules are not [[CSS3-ANIMATIONS]].
(
or starts with @
,
such a tokens is not a <'+'
.
Similarly, the <[-token> and <]-token>s must be written in single quotes,
as they're used by the syntax of the grammar itself to group clauses.
<translateX( <
However, the stylesheet may end with the function unclosed, like:
.foo { transform: translate(50px
The CSS parser parses this as a style rule containing one declaration,
whose value is a function named "translate".
This matches the above grammar,
even though the ending token didn't appear in the token stream,
because by the time the parser is finished,
the presence of the ending token is no longer possible to determine;
all you have is the fact that there's a block and a function.
Defining Block Contents: the <
The CSS parser is agnostic as to the contents of blocks,
such as those that come at the end of some at-rules.
Defining the generic grammar of the blocks in terms of tokens is non-trivial,
but there are dedicated and unambiguous algorithms defined for parsing this.
The <declaration-list> production represents a list of declarations.
It may only be used in grammars as the sole value in a block,
and represents that the contents of the block must be parsed using the consume a list of declarations algorithm.
Similarly, the <rule-list> production represents a list of rules,
and may only be used in grammars as the sole value in a block.
It represents that the contents of the block must be parsed using the consume a list of rules algorithm.
Finally, the <stylesheet> production represents a list of rules.
It is identical to <@font-face { <
This is a complete and sufficient definition of the rule's grammar.
For another example,
''@keyframes'' rules are more complex,
interpreting their prelude as a name and containing keyframes rules in their block
Their grammar is:
@keyframes <
!important
is automatically invalid on any descriptors.
If the rule accepts properties,
the spec for the rule must define whether the properties interact with the cascade,
and with what specificity.
If they don't interact with the cascade,
properties containing !important
are automatically invalid;
otherwise using !important
is valid and has its usual effect on the cascade origin of the property.
<
Keyframe rules, then,
must further define that they accept as declarations all animatable CSS properties,
plus the 'animation-timing-function' property,
but that they do not interact with the cascade.
@media <
It additionally defines a restriction that the <
Defining Arbitrary Contents: the <
In some grammars,
it is useful to accept any reasonable input in the grammar,
and do more specific error-handling on the contents manually
(rather than simply invalidating the construct,
as grammar mismatches tend to do).
For example, custom properties allow any reasonable value,
as they can contain arbitrary pieces of other CSS properties,
or be used for things that aren't part of existing CSS at all.
For another example, the <
CSS stylesheets
To parse a CSS stylesheet,
first parse a stylesheet.
Interpret all of the resulting top-level qualified rules as style rules, defined below.
If any style rule is invalid,
or any at-rule is not recognized or is invalid according to its grammar or context,
it's a parse error.
Discard that rule.
Style rules
A style rule is a qualified rule
that associates a selector list [[!SELECT]]
with a list of property declarations.
They are also called
rule sets in [[!CSS21]].
CSS Cascading and Inheritance [[!CSS3CASCADE]] defines how the declarations inside of style rules participate in the cascade.
The prelude of the qualified rule is [=CSS/parsed=]
as a <
The ''@charset'' Rule
The algorithm used to determine the fallback encoding for a stylesheet
looks for a specific byte sequence as the very first few bytes in the file,
which has the syntactic form of an at-rule named "@charset".
However, there is no actual at-rule named @charset.
When a stylesheet is actually parsed,
any occurrences of an ''@charset'' rule must be treated as an unrecognized rule,
and thus dropped as invalid when the stylesheet is grammar-checked.
Note: The algorithm to parse a stylesheet explicitly drops the first ''@charset'' rule from the document,
before the stylesheet is grammar-checked,
so valid rules that must appear first in the stylesheet,
such as ''@import'',
can still be preceded by an (invalid) ''@charset'' rule
without making themselves invalid.
Note: In CSS 2.1, ''@charset'' was a valid rule.
Some legacy specs may still refer to a ''@charset'' rule,
and explicitly talk about its presence in the stylesheet.
Serialization
The tokenizer described in this specification does not produce tokens for comments,
or otherwise preserve them in any way.
Implementations may preserve the contents of comments and their location in the token stream.
If they do, this preserved information must have no effect on the parsing step.
This specification does not define how to serialize CSS in general,
leaving that task to the CSSOM and individual feature specifications.
In particular, the serialization of comments and whitespace is not defined.
The only requirement for serialization is that it must "round-trip" with parsing,
that is, parsing the stylesheet must produce the same data structures as
parsing, serializing, and parsing again,
except for consecutive <
/**/
) must be inserted.
(Preserved comments may be reinserted even if the following tables don't require a comment between two tokens.)
Single characters in the row and column headings represent a <(
",
which represents a (-token.
ident
function
url
bad url
-
number
percentage
dimension
CDC
(
*
ident
✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
at-keyword
✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
hash
✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
dimension
✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
#
✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
-
✗ ✗ ✗ ✗ ✗ ✗ ✗ ✗
number
✗ ✗ ✗ ✗ ✗ ✗ ✗
@
✗ ✗ ✗ ✗ ✗
.
✗ ✗ ✗
+
✗ ✗ ✗
/
✗
Serializing <an+b>
: |A| is
4.
1
:: Append "n" to |result|.
: |A| is -1
:: Append "-n" to |result|.
: |A| is non-zero
:: Serialize |A| and append it to |result|,
then append "n" to |result|.
: |B| is greater than zero
:: Append "+" to |result|,
then append the serialization of |B| to |result|.
: |B| is less than zero
:: Append the serialization of |B| to |result|.
5. Return |result|.
Privacy and Security Considerations
This specification introduces no new privacy concerns.
This specification improves security, in that CSS parsing is now unambiguously defined for all inputs.
Insofar as old parsers, such as whitelists/filters, parse differently from this specification,
they are somewhat insecure,
but the previous parsing specification left a lot of ambiguous corner cases which browsers interpreted differently,
so those filters were potentially insecure already,
and this specification does not worsen the situation.
Changes
This section is non-normative.
Changes from the 20 February 2014 Candidate Recommendation
The following substantive changes were made:
* Removed <\
followed by an EOF is now correctly reported as not a valid escape.
* If the very first rule in the sheet is a ''@charset'', dropped it during parsing.
The following editorial changes were made:
* The "Consume a string token" algorithm was changed to allow calling it without specifying an explicit ending token,
so that it uses the current input token instead.
The three call-sites of the algorithm were changed to use that form.
* Minor editorial restructuring of algorithms.
* Added the [=CSS/parse=] and [=parse a comma-separated list of component values=] API entry points.
* Added the <
Changes from the 5 November 2013 Last Call Working Draft
@charset
byte sequence to 1024 bytes.
This aligns with what HTML does for <meta charset>
and makes sure the size of the sequence is bounded.
This only makes a difference with leading or trailing whitespace
in the encoding label:
@charset " (lots of whitespace) utf-8";
Changes from the 19 September 2013 Working Draft
Changes from CSS 2.1 and Selectors Level 3
Note: The point of this spec is to match reality;
changes from CSS2.1 are nearly always because CSS 2.1 specified something that doesn't match actual browser behavior,
or left something unspecified.
If some detail doesn't match browsers,
please let me know
as it's almost certainly unintentional.
Changes in decoding from a byte stream:
Tokenization changes:
Parsing changes:
An+B changes from Selectors Level 3 [[SELECT]]:
Acknowledgments
Thanks for feedback and contributions from
Anne van Kesteren,
David Baron,
Henri Sivonen,
Johannes Koch,
呂康豪 (Kang-Hao Lu),
Marc O'Morain,
Raffaello Giulietti,
Simon Pieter,
Tyler Karaszewski,
and Zack Weinberg.