CSS is a language for
describing the rendering of structured documents (such as HTML and XML) on
screen, on paper, in speech, etc. This module
describes, in general terms, the basic structure and syntax of CSS
stylesheets. It defines, in detail, the syntax and parsing of CSS - how to
turn a stream of bytes into a meaningful stylesheet.
Status of this document
This is a public copy of the editors' draft. It is provided for
discussion only and may change at any moment. Its publication here does
not imply endorsement of its contents by W3C. Don't cite this document
other than as work in progress.
The (archived) public
mailing list
www-style@w3.org (see instructions) is preferred for
discussion of this specification. When sending e-mail, please put the text
“css3-syntax” in the subject, preferably like this:
“[css3-syntax] …summary of comment…”
This module defines the abstract syntax and parsing of CSS stylesheets
and other things which use CSS syntax (such as the HTML style
attribute).
It defines algorithms for converting a stream of codepoints (in other
words, text) into a stream of CSS tokens, and then further into CSS
objects such as stylesheets, rules, and declarations.
1.1. Module interactions
This module defines the syntax and parsing of CSS stylesheets. It
supersedes the lexical scanner and grammar defined in CSS 2.1.
2. Description of CSS's
Syntax
This section is not normative.
A CSS document is a series of qualified
rules, which are usually style rules that apply CSS properties to
elements, and at-rules, which define special
processing rules or values for the CSS document.
A qualified rule starts with a prelude
then has a {}-wrapped block containing a sequence of declarations. The
meaning of the prelude varies based on the context that the rule appears
in - for style rules, it's a selector which specifies what elements the
declarations will apply to. Each declaration has a name, followed by a
colon and the declaration value, and finished with a semicolon.
A typical rule might look something like this:
p > a {
color: blue;
text-decoration: underline;
}
In the above rule, "p > a" is the selector, which, if the
source document is HTML, selects any <a> elements that
are children of a <p> element.
"color: blue;" is a declaration specifying that, for the
elements that match the selector, their ‘color’ property should have the value ‘blue’. Similiarly, their ‘text-decoration’ property should have the value
‘underline’.
At-rules are all different, but
they have a basic structure in common. They start with an "@" character
followed by their name. Some at-rules are
simple statements, with their name followed by more CSS values to specify
their behavior, and finally ended by a semicolon. Others are blocks; they
can have CSS values following their name, but they end with a {}-wrapped
block, similar to a rule. Even the contents of these blocks are
specific to the given at-rule: sometimes
they contain a sequence of declarations, like a rule; other times,
they may contain additional blocks, or at-rules, or other structures
altogether.
Here are several examples of at-rules
that illustrate the varied syntax they may contain.
@import "my-styles.css";
The ‘@import’ at-rule is a simple statement. After its name,
it takes a single string or ‘url()’ function
to indicate the stylesheet that it should import.
The ‘@page’ at-rule consists of an optional page selector
(the ":left" pseudoclass), followed by a block of properties that apply
to the page when printed. In this way, it's very similar to a normal
style rule, except that its properties don't apply to any
"element", but rather the page itself.
@media print {
body { font-size: 10pt }
}
The ‘@media’ at-rule begins with a media type and a list of
optional media queries. Its block contains entire rules, which are only
applied when the ‘@media’s conditions are
fulfilled.
Property names and at-rule names are
always idents, which have to start with a letter or a hyphen
followed by a letter, and then can contain letters, numbers, hyphens, or
underscores. You can include any character at all, even ones that CSS uses
in its syntax, by escaping it with a backslash (\) or by using a
hexadecimal escape.
The syntax of selectors is defined in the Selectors spec. Similarly, the
syntax of the wide variety of CSS values is defined in the Values & Units spec. The
special syntaxes of individual at-rules can
be found in the specs that define them.
2.1. Error Handling
This section is not normative.
When errors occur in CSS, the parser attempts to recover gracefully,
throwing away only the minimum amount of content before returning to
parsing as normal. This is because errors aren't always mistakes - new
syntax looks like an error to an old parser, and it's useful to be able to
add new syntax to the language without worrying stylesheets that include
it being completely broken in older UAs.
The precise error-recovery behavior is detailed in the parser itself,
but it's simple enough that a short description is fairly accurate:
If an error is encountered while parsing a declaration, the parser throws away the
declaration and skips forward until it next encounters a top-level
semicolon token, or the declaration's enclosing rule is ended.
If an error is encountered while parsing the prelude of a qualified rule, the parser throws away
the rule and skips forward until it next encounters a simple {} block, or
the rule's enclosing rule is ended.
If an error is encountered while parsing the prelude of an at-rule, the parser throws away the rule and
skips forward until it next encounters either a simple {} block, or a
semicolon token.
If the stylesheet ends while any rule, declaration, function, string,
etc. are still open, everything is automatically closed. This doesn't
make them invalid, though they may be incomplete and thus thrown away
when they are verified against their grammar.
3. Tokenizing
and Parsing CSS
User agents must use the parsing rules described in this specification
to generate the CSSOM trees from text/css resources. Together, these rules
define what is referred to as the CSS parser.
This specification defines the parsing rules for CSS documents, whether
they are syntactically correct or not. Certain points in the parsing
algorithm are said to be a parse
errors. The error handling for parse errors is well-defined: user
agents must either act as described below when encountering such problems,
or must abort processing at the first error that they encounter for which
they do not wish to apply the rules described below.
Conformance checkers must report at least one parse error condition to
the user if one or more parse error conditions exist in the document and
must not report parse error conditions if none exist in the document.
Conformance checkers may report more than one parse error condition if
more than one parse error condition exists in the document. Conformance
checkers are not required to recover from parse errors.
3.1.
Overview of the Parsing Model
The input to the CSS parsing process consists of a stream of Unicode
code points, which is passed through a tokenization stage followed by a
tree construction stage. The output is a CSSStyleSheet object.
Implementations that do not support scripting do not have to
actually create a CSSOM CSSStyleSheet object, but the CSSOM tree in such
cases is still used as the model for the rest of the specification.
3.2. The input byte
stream
The stream of Unicode code points that comprises the input to the
tokenization stage will be initially seen by the user agent as a stream of
bytes (typically coming over the network or from the local file system).
The bytes encode the actual characters according to a particular character
encoding, which the user agent must use to decode the bytes into
characters.
To decode the stream of bytes into a stream of characters, UAs must
follow these steps.
If HTTP or equivalent protocol defines an encoding (e.g. via the
charset parameter of the Content-Type header), get an encoding for the specified
value. If that does not return failure, use the return value as the
fallback encoding.
Otherwise, check the byte stream. If the first several bytes match
the hex sequence
40 63 68 61 72 73 65 74 20 22 (not 22)* 22 3B
then get an encoding for the
sequence of (not 22)* bytes, decoded per
windows-1252.
Note: Anything ASCII-compatible will do, so using
windows-1252 is fine.
Note: The byte sequence above, when decoded as ASCII, is
the string "@charset "…";", where the "…" is the
sequence of bytes corresponding to the encoding's name.
If the return value was utf-16 or utf-16be,
use utf-8 as the fallback encoding; if it was anything else
except failure, use the return value as the fallback encoding.
This mimics HTML <meta> behavior.
Otherwise, get an encoding for
the value of the charset attribute on the
<link> element or <?xml-stylesheet?>
processing instruction that caused the style sheet to be included, if
any. If that does not return failure, use the return value as the
fallback encoding.
Otherwise, if the referring style sheet or document has an encoding,
use that as the fallback encoding.
Otherwise, use utf-8 as the fallback encoding.
Then, decode the byte stream using the
fallback encoding.
Note: the decode algorithm lets
the byte order mark (BOM) take precedence, hence the usage of the term
"fallback" above.
Anne says that steps 3/4 should be an input to this
algorithm from the specs that define importing stylesheet, to make the
algorithm as a whole cleaner. Perhaps abstract it into the concept of an
"environment charset" or something?
3.2.1.
Preprocessing the input stream
The input stream consists of the characters pushed into it as the input
byte stream is decoded.
Before sending the input stream to the tokenizer, implementations must
make the following character substitutions:
Replace any U+000D CARRIAGE RETURN (CR) characters or pairs of U+000D
CARRIAGE RETURN (CR) followed by U+000A LINE FEED (LF) by a single U+000A
LINE FEED (LF) character.
Replace any U+0000 NULL characters with U+FFFD REPLACEMENT CHARACTER
(�).
4. Tokenization
Implementations must act as if they used the following state machine to
tokenize CSS. The state machine must start in the data state. Most states consume a single
character, which may have various side-effects, and either switches the
state machine to a new state to reconsume the same character, or switches
it to a new state to consume the next character, or stays in the same
state to consume the next character. Some states have more complicated
behavior and can consume several characters before switching to another
state.
The output of the tokenization step is a series of zero or more of the
following tokens: ident, function, at-keyword, hash, string, bad-string,
url, bad-url, delim, number, percentage, dimension, unicode-range,
include-match, dash-match, prefix-match, suffix-match, substring-match,
column, whitespace, cdo, cdc, colon, semicolon, comma, [, ], (, ), {, and
}.
Ident, function, at-keyword, hash, string, and url tokens have a value
composed of zero or more characters. Additionally, hash tokens have a type
flag set to either "id" or "unrestricted". The type flag defaults to
"unrestricted" if not otherwise set. Delim tokens have a value composed of
a single character. Number, percentage, and dimension tokens have a
representation composed of 1 or more character, and a numeric value.
Number tokens additionally have a type flag set to either "integer" or
"number". The type flag defaults to "integer" if not otherwise set.
Dimension tokens additionally have a unit composed of one or more
characters. Unicode-range tokens have a range of characters.
The type flag of hash tokens is used in the Selectors syntax
[SELECT]. Only hash
tokens with the "id" type are valid ID selectors.
The tokenizer state machine consists of the states defined in the
following subsections.
4.1. Token Railroad
Diagrams
This section is non-normative.
This section presents an informative view of the tokenizer, in the form
of railroad diagrams. Railroad diagrams are more compact than a
state-machine, but often easier to read than a regular expression.
These diagrams are informative and incomplete; they
describe the grammar of "correct" tokens, but do not describe
error-handling at all. They are provided solely to make it easier to get
an intuitive grasp of the syntax of each token.
Diagrams with names in all uppercase represent tokens. The rest are
productions referred to by other diagrams.
comment
newline
whitespace character
escape
WHITESPACE
IDENT
FUNCTION
AT-KEYWORD
HASH
STRING
URL
url-unquoted
NUMBER
DIMENSION
PERCENTAGE
UNICODE-RANGE
INCLUDE-MATCH
DASH-MATCH
PREFIX-MATCH
SUFFIX-MATCH
SUBSTRING-MATCH
COLUMN
CDO
CDC
4.2. Tokenizer Flags
The tokenizer can be run with any of several flags that alter its
behavior.
the transform function
whitespace flag
This flag is set when parsing SVG's transform attribute.
When this is set, whitespace is allowed between the name of a transform
function and its opening parenthesis.
4.3. Definitions
This section defines several terms used during the tokenization phase.
next input character
The first character in the input stream that has not yet been
consumed.
current input character
The last character to have been consumed.
reconsume the current
input character
Push the current input
character back onto the front of the input stream, so that the
next time you are instructed to consume the next input character, it will
instead reconsume the current input
character.
EOF character
A conceptual character representing the end of the input stream.
Whenever the input stream is empty, the next input character is always an
EOF character.
digit
A character between U+0030 DIGIT ZERO (0) and U+0039 DIGIT NINE (9).
hex digit
A digit, or a character between U+0041
LATIN CAPITAL LETTER A (A) and U+0046 LATIN CAPITAL LETTER F (F), or a
character between U+0061 LATIN SMALL LETTER A (a) and U+0066 LATIN SMALL
LETTER F (f).
uppercase letter
A character between U+0041 LATIN CAPITAL LETTER A (A) and U+005A
LATIN CAPITAL LETTER Z (Z).
lowercase letter
A character between U+0061 LATIN SMALL LETTER A (a) and U+007A LATIN
SMALL LETTER Z (z).
A character between U+0000 NULL and U+0008 BACKSPACE or a character
between U+000E SHIFT OUT and U+001F INFORMATION SEPARATOR ONE or a
character between U+007F DELETE and U+009F APPLICATION PROGRAM COMMAND.
newline
U+000A LINE FEED or U+000C FORM FEED. Note that
U+000D CARRIAGE RETURN is not included in this definition, as it is
removed from the stream during preprocessing.
whitespace
A newline, U+0009 CHARACTER TABULATION,
or U+0020 SPACE.
maximum allowed codepoint
The greatest codepoint defined by Unicode. This is currently U+10FFFF.
Otherwise, if the next 2 input characters are U+002D
HYPHEN-MINUS U+003E GREATER-THAN SIGN (->), consume them, emit a CDC
token, and remain in this state.
Otherwise, emit a delim token with its value set to the current input character.
Remain in this state.
Otherwise, emit a delim token with its value set to U+002F SOLIDUS
(/). Remain in this state.
U+003A COLON (:)
Emit a colon token. Remain in this state.
U+003B SEMICOLON (;)
Emit a semicolon token. Remain in this state.
U+003C LESS-THAN SIGN (<)
If the next 3 input characters are U+0021
EXCLAMATION MARK U+002D HYPHEN-MINUS U+002D HYPHEN-MINUS (!--), consume
them and emit a cdo token. Remain in this state.
Otherwise, emit a delim token with its value set to U+003C LESS-THAN
SIGN (<). Remain in this state.
If this state emits a hash token whose value is the empty
string, it's a spec or implementation error. The data validation performed
in the data state should have guaranteed
a non-empty value.
If this state emits an at-keyword token whose value is the
empty string, it's a spec or implementation error. The data validation
performed in the data state should have
guaranteed a non-empty value.
4.4.7. Ident state
When this state is first entered, create an ident token with its value
initially set to the empty string.
Otherwise, if the next input
character is U+0028 LEFT PARENTHESES ((), emit a function token
with its value set to the identifier token's value. Switch to the data state.
Append the current input
character to the number token's representation. Remain in this
state.
U+002E FULL STOP (.)
If the number token's type flag is currently "integer" and the next input character is a digit, consume it. Append U+002E FULL STOP (.)
followed by the digit to the number token's
representation. Set the number token's type flag to "number". Remain in
this state.
Otherwise, set the number token's value to the number produced by
interpreting the number token's representation as a base-10 number and
emit it. Switch to the data state. Reconsume the current
input character.
U+0025 PERCENT SIGN (%)
Emit a percentage token with its value set to the number produced by
interpreting the number token's representation as a base-10 number.
Switch to the data state.
U+0045 LATIN CAPITAL LETTER E (E)
U+0065 LATIN SMALL LETTER E (e)
If the next input
character is a digit, or the next 2 input
characters are U+002B PLUS SIGN (+) or U+002D HYPHEN-MINUS (-)
followed by a digit, consume them. Append
U+0065 LATIN SMALL LETTER E (e) and the consumed characters to the number
token's representation. Switch to the sci-notation state.
Otherwise, create a dimension token with its representation set to
the number token's representation, its value set to the number produced
by interpreting the number token's representation as a base-10 number,
and a unit initially set to the current input character.
Switch to the dimension state.
U+002D HYPHEN-MINUS (-)
If the input stream starts with an
identifier, create a dimension token with its representation set
to the number token's representation, its value set to the number
produced by interpreting the number token's representation as a base-10
number, and a unit initially set to the current input character.
Switch to the dimension state.
Otherwise, set the number token's value to the number produced by
interpreting the number token's representation as a base-10 number and
emit it. Switch to the data state. Reconsume the current
input character.
Create a dimension token with its representation set to the number
token's representation, its value set to the number produced by
interpreting the number token's representation as a base-10 number, and a
unit initially set to the current
input character. Switch to the dimension state.
U+005C REVERSE SOLIDUS (\)
If the input stream starts with a valid
escape, consume an
escaped character. Create a dimension token with its
representation set to the number token's representation, its value set to
the number produced by interpreting the number token's representation as
a base-10 number, and a unit initially set to the returned character.
Switch to the dimension state.
Otherwise, set the number token's value to the number produced by
interpreting the number token's representation as a base-10 number and
emit it. Switch to the data state. Reconsume the current
input character.
anything else
Emit a number token with its value set to the number produced by
interpreting the string token's value as a base-10 number. Switch to the
data state. Reconsume the current
input character.
Create a new unicode-range token with an empty range.
Consume as many hex digits as possible,
but no more than 6. If less than 6 hex
digits were consumed, consume as many U+003F QUESTION MARK (?)
characters as possible, but no more than enough to make the total of hex digits and U+003F QUESTION MARK (?)
characters equal to 6.
If any U+003F QUESTION MARK (?) characters were consumed, first
interpret the consumed characters as a hexadecimal number, with the U+003F
QUESTION MARK (?) characters replaced by U+0030 DIGIT ZERO (0) characters.
This is the start of the range.
Then interpret the consumed characters as a hexadecimal number again, with
the U+003F QUESTION MARK (?) character replaced by U+0046 LATIN CAPITAL
LETTER F (F) characters. This is the end of
the range. Set
the unicode-range token's range, then emit it. Switch to the data state.
Otherwise, interpret the digits as a hexadecimal number. This is the start of the range.
This section describes how to consume an escaped
character. It assumes that the U+005C REVERSE SOLIDUS (\) has
already been consumed and that the next input character has already been
verified to not be a newline or EOF. It will
return a character.
Consume as many hex digits as
possible, but no more than 5. Note that this means 1-6
hex digits have been consumed in total. If the next input character is whitespace, consume it as well. Interpret
the hex digits as a hexadecimal number.
If this number is zero, or is greater than the maximum allowed codepoint,
return U+FFFD REPLACEMENT CHARACTER (�). Otherwise, return the
character with that codepoint.
4.7.
Convert
a sci-notation representation into a value
This section describes how to turn a sci-notation representation into a
numeric value.
Let base be the result of interpreting the portion of the
representation preceding the U+0045 LATIN CAPITAL LETTER E (E) or U+0065
LATIN SMALL LETTER E (e) as a base-10 number.
Let power be the result of interpreting the portion of the
representation following the U+0045 LATIN CAPITAL LETTER E (E) or U+0065
LATIN SMALL LETTER E (e) as a base-10 number.
Note that the point of this spec is to match reality;
changes from CSS2.1's tokenizer are nearly always because the
tokenizer specified something that doesn't match actual browser behavior,
or left something unspecified. If some detail doesn't match browsers,
please let me know as it's almost certainly unintentional.
The prefix-match, suffix-match, and substring-match tokens have been
imported from Selectors 3.
The BAD-URI token (now bad-url) is "self-contained". In other words,
once the tokenizer realizes it's in a bad-url rather than a url token, it
just seeks forward to look for the closing ), ignoring everything else.
This behavior is simpler than treating it like a FUNCTION token and
paying attention to opened blocks and such. Only WebKit exhibits this
behavior, but it doesn't appear that we've gotten any compat bugs from
it.
The comma token has been added.
The number, percentage, and dimension tokens have been changed to
include the preceding +/- sign as part of their value (rather than as a
separate DELIM token that needs to be manually handled every time the
token is mentioned in other specs). The only consequence of this is that
comments can no longer be inserted between the sign and the number.
Some flags have been added for SVG-compatible tokenizing, so that a
single state machine can be used for both "vanilla" and SVG CSS parsing.
Scientific notation is supported for numbers, per WG resolution.
5. Parsing
The input to the parsing stage is a stream or list of tokens from the
tokenization stage. The output depends on how the parser is invoked, as
defined by the entry points listed later in this section. The parser
output can consist of at-rules, qualified rules, and/or declarations.
The parser's output is constructed according to the fundamental syntax
of CSS, without regards for the validity of any specific item.
Implementations may check the validity of items as they are returned by
the various parser algorithms and treat the algorithm as returning nothing
if the item was invalid according to the implementation's own grammar
knowledge, or may construct a full tree as specified and "clean up"
afterwards by removing any invalid items.
The items that can appear in the tree are a mixture of basic tokens and
new objects:
at-rule
An at-rule has a name, a prelude consisting of a list of component
values, and an optional value consisting of an simple {} block.
This specification places no limits on what an at-rule's
value may contain. Individual at-rules must define whether they accept a
value, and if so, how to parse it (preferably using one of the parser
algorithms or entry points defined in this specification).
qualified rule
A qualified rule has a prelude consisting of a list of component
values, and a value consisting of a list of at-rules or declarations.
Most qualified rules will be style rules, where the
prelude is a selector.
declaration
A declaration has a name, a value consisting of a list of component
values, and an important flag which is initially unset.
component value
A component value is one of the preserved tokens, a function, or a
simple block.
preserved tokens
Any token produced by the tokenizer except for function tokens, {
tokens, ( tokens, and [ tokens.
The non-preserved tokens listed above are always consumed
into higher-level objects, either functions or simple blocks, and so
never appear in any parser output themselves.
function
A function has a name and a value consisting of a list of component
values.
simple block
A simple block has an associated token (either a [, (, or { token)
and a value consisting of a list of component values.
recognized at-rule name
When the parser is invoked, it must be provided with a list of recognized at-rule names,
representing the at-rules that the invoker
knows about. Each name in the list is additionally associated with
whether the at-rule is rule-filled,
declaration-filled, or a statement.
5.1. Parser Railroad
Diagrams
This section is non-normative.
This section presents an informative view of the parser, in the form of
railroad diagrams. Railroad diagrams are more compact than a
state-machine, but often easier to read than a regular expression.
These diagrams are informative and incomplete; they
describe the grammar of "correct" stylesheets, but do not describe
error-handling at all. They are provided solely to make it easier to get
an intuitive grasp of the syntax.
Stylesheet
At-rule
Qualified rule
Rule list
Declaration/at-rule list
Declaration
!important
ws*
Component value
{} block
() block
[] block
Function block
5.2. Parser Flags
No flags currently.
5.3. Definitions
current input token
The token or component value
currently being operated on, from the list of tokens produced by the
tokenizer.
A conceptual token representing the end of the list of tokens.
Whenever the list of tokens is empty, the next input token is always an EOF
token.
reconsume the current input
token
Push the current input
token back onto the list of tokens produced by the tokenizer, so
that the next time a mode instructs you to consume the next input token,
it will instead reconsume the current
input token.
ASCII case-insensitive
When two strings are to be matched ASCII case-insensitively,
temporarily convert both of them to ASCII lower-case form by adding 32
(0x20) to the value of each codepoint between U+0041 LATIN CAPITAL LETTER
A (A) and U+005A LATIN CAPITAL LETTER Z (Z), inclusive, and then compare
them on a codepoint-by-codepoint basis.
5.4. Parser Entry
Points
The algorithms defined in this specification can be invoked in multiple
ways to convert a stream of text into various CSS concepts.
All of the algorithms defined in this section begin in the parser. It
is assumed that the input
preprocessing and tokenization steps have
already been completed, resulting in a stream of tokens.
Other specs can define additional entry points for their own
purposes.
The following notes should probably be translated into normative text
in the relevant specs, hooking this spec's terms:
"Parse a stylesheet" is
intended to be the normal parser entry point, for parsing stylesheets.
"Parse a rule" is intended for
use by the CSSStyleSheet#insertRule method, and similar
functions which might exist, which parse text into a single rule.
"Parse a list of
declarations" is for the contents of a style
attribute, which parses text into the contents of a single style rule.
Dunno about "Parse a value" yet. I'll remove it if I don't
figure out what to do with it.
"Parse a list of values" is for the contents of
presentational attributes, which parse text into a single declaration's
value.
"Parse a comma-separated list of values" is similar, but for
comma-separated lists.
Are there any other things somewhere where some tech (that isn't
straight CSS itself) needs to parse some text into CSS?
All of the algorithms defined in this spec may be called with either a
list of tokens or of component values. Either way produces an identical
result.
If the at-keyword token's name is on the list of recognized at-rule names,
and the list indicates that it is a rule-filled or
declaration-filled at-rule, consume an at-rule. If nothing
was returned, return a syntax error.
Otherwise, consume an at-statement. If nothing was returned,
return a syntax error.
Discard whitespace tokens from the token stream until a
non-whitespace token is reached. If the token stream is exhausted without
finding a non-whitespace token, return a syntax error.
Discard whitespace tokens from the token stream until a
non-whitespace token is reached. If the token stream is exhausted without
finding a non-whitespace token, return the value found in the previous
step. Otherwise, return a syntax error.
Trim U+0020 SPACE ( ) characters from the front and back of
repr, then process it as follows:
If repr is an ASCII
case-insensitive match for the string "odd", set step
to 2 and offset to 1.
Otherwise, if repr is an ASCII case-insensitive match
for the string "even", set step to 2 and offset to
0.
Otherwise, if repr consists solely of digits, optionally prefixed with a single U+002B
PLUS SIGN (+) or U+002D HYPHEN-MINUS (-), interpret repr as a
base-10 number, and set step to the result.
Otherwise, if repr contains U+004E LATIN CAPITAL LETTER N
(N), or U+006E LATIN SMALL LETTER N (n), split repr into two
substrings composed respectively of the characters preceding and
following the first such letter.
Interpret the first string as follows:
If the first string is empty, set step to 1.
Otherwise, if the first string consists solely of a single U+002B
PLUS SIGN (+) or U+002D HYPHEN-MINUS (-), set step to 1 or
-1, respectively.
Otherwise, if the first string consists solely of digits, optionally prefixed with a single
U+002B PLUS SIGN (+) or U+002D HYPHEN-MINUS (-), interpret the first
string as a base-10 number, and set step to the result.
Otherwise, this is a parse
error; return a syntax error.
Interpret the second string as follows:
If the second string is empty, set offset to 0.
Otherwise, if the second string consists solely of 0 or more U+0020
SPACE characters, optionally followed by a single U+002B PLUS SIGN (+)
or U+002D HYPHEN-MINUS (-) character, followed by 0 or more U+0020
SPACE characters, followed by 1 or more digits, interpret the digits as a base-10
number. If there was a U+002D HYPHEN-MINUS (-) character, negate the
result. Set offset to the result.
Otherwise, this is a parse
error; return a syntax error.
Otherwise, this is a parse error;
return a syntax error.
Return step and offset.
5.5. Parser Algorithms
The following algorithms comprise the parser. They are called by the
parser entry points above.
These algorithms may be called with a list of either tokens or of
component values. (The difference being that some tokens are replaced by
functions and simple blocks in a list of component
values.) Similar to how the input stream returned EOF characters to
represent when it was empty during the tokenization stage, the lists in
this stage must return an EOF token when the next token is requested but
they are empty.
An algorithm may be invoked with a specific list, in which case it
consumes only that list (and when that list is exhausted, it begins
returning EOF tokens). Otherwise, it is implicitly invoked with the same
list as the invoking algorithm.
Create a new at-rule with its name set to the value of the current input token, its prelude
initially set to an empty list, and its value initially set to nothing.
Initialize a temporary list initially filled with the current input token. Repeatedly consume a component value
from the next input token until a
semicolon token or EOF token is returned, appending all of the returned
values up to that point to the temporary list. Consume a declaration from the
temporary list. If anything was returned, append it to the list of
declarations.
Create a new declaration with its name set to the value of the current input token.
Repeatedly consume whitespace tokens until a non-whitespace token is
reached. If this token is anything but a colon token, this is a parse error. Return nothing.
Otherwise, repeatedly consume
a component value from the next
input token until an EOF token is reached, appending all of the
returned values up to that point to the declaration's value.
If the last two non-whitespace tokens in the declaration's value are a
delim token with the value "!" followed by an ident token with a value
that is an ASCII
case-insensitive match for "important", remove them from the
declaration's value and set the declaration's important flag to
true.
Note that the point of this spec is to match reality;
changes from CSS2.1's Core Grammar are nearly always because the Core
Grammar specified something that doesn't match actual browser behavior, or
left something unspecified. If some detail doesn't match browsers, please
let me know as it's almost certainly unintentional.
The handling of some miscellanous "special" tokens (like an unmatched
} token) showing up in various places in the grammar has been specified
with some reasonable behavior shown by at least one browser. Previously,
stylesheets with those tokens in those places just didn't match the
stylesheet grammar at all, so their handling was totally undefined.
Specifically:
[] blocks, () blocks and functions can now contain {} blocks,
at-keywords or semicolons
Selectors can now contain semicolons
Selectors and at-rule preludes can now contain at-keywords
6. Serialization
This specification does not define how to serialize CSS in general,
leaving that task to the CSSOM and individual feature specifications.
However, there is one important facet that must be specified here
regarding comments, to ensure accurate "round-tripping" of data from text
to CSS objects and back.
The tokenizer described in this specification does not produce tokens
for comments, or otherwise preserve them in any way. Implementations may
preserve the contents of comments and their location in the token stream.
If they do, this preserved information must have no effect on the parsing
step, but must be serialized in its position as "/*" followed by its
contents followed by "*/".
If the implementation does not preserve comments, it must insert the
text "/**/" between the serialization of adjacent tokens when the two
tokens are of the following pairs:
hash or at-keyword token followed by a number, percentage, ident,
dimension, unicode-range, url, or a function token;
number, ident, and dimension tokens in any combination;
number, ident, or dimension token followed by a percentage,
unicode-range, url, or function token;
ident token followed by a ( token;
a delim token containing "#" or "@" followed by any token except
whitespace;
a delim token containing "-", "+", ".", "<", ">", or "!" following
or followed by any token except whitespace;
a delim token containing "/" following or followed by a delim token
containing "*".
The preceding pairs of tokens can only be adjacent due to
comments in the original text, so the above rule reinserts the minimum
number of comments into the serialized text to ensure an accurate
round-trip. (Roughly. The delim token rules are slightly too powerful, for
simplicity.)
7. Conformance
7.1. Document conventions
Conformance requirements are expressed with a combination of descriptive
assertions and RFC 2119 terminology. The key words “MUST”, “MUST
NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”,
“SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the
normative parts of this document are to be interpreted as described in RFC
2119. However, for readability, these words do not appear in all uppercase
letters in this specification.
All of the text of this specification is normative except sections
explicitly marked as non-normative, examples, and notes. [RFC2119]
Examples in this specification are introduced with the words “for
example” or are set apart from the normative text with
class="example", like this:
This is an example of an informative example.
Informative notes begin with the word “Note” and are set apart from
the normative text with class="note", like this:
Note, this is an informative note.
7.2. Conformance
classes
Conformance to CSS Syntax Module Level 3 is defined for three
conformance classes: