8000 csswg-drafts/css-syntax/Overview.bs at 595472230c7e665a57c4ce22b39569c6484302d6 · w3c/csswg-drafts · GitHub
Skip to content

Latest commit

 

History

History
3667 lines (2935 loc) · 125 KB

File metadata and controls

3667 lines (2935 loc) · 125 KB
Or: 1
T: --
Seq:
Opt: skip
T: -
Or:
N: a-z A-Z _ or non-ASCII
N: escape
Star:
Or:
N: a-z A-Z 0-9 _ - or non-ASCII
N: escape
</pre>
<dt id="function-token-diagram"><<function-token>>
<dd>
<pre class='railroad'>
N: <ident-token>
T: (
</pre>
<dt id="at-keyword-token-diagram"><<at-keyword-token>>
<dd>
<pre class='railroad'>
T: @
N: <ident-token>
</pre>
<dt id="hash-token-diagram"><<hash-token>>
<dd>
<pre class='railroad'>
T: #
Plus:
Choice:
N:a-z A-Z 0-9 _ - or non-ASCII
N: escape
</pre>
<dt id="string-token-diagram"><<string-token>>
<dd>
<pre class='railroad'>
Choice:
Seq:
T: "
Star:
Choice:
N: not " \ or newline
N: escape
Seq:
T: \
N: newline
T: "
Seq:
T: '
Star:
Choice:
N: not ' \ or newline
N: escape
Seq:
T: \
N: newline
T: '
</pre>
<dt id="url-token-diagram"><<url-token>>
<dd>
<pre class='railroad'>
N: <ident-token "url">
T: (
N: ws*
Star:
Choice:
N: not " ' ( ) \ ws or non-printable
N: escape
N: ws*
T: )
</pre>
<dt id="number-token-diagram"><<number-token>>
<dd>
<pre class='railroad'>
Choice: 1
T: +
Skip:
T: -
Choice:
Seq:
Plus:
N: digit
T: .
Plus:
N: digit
Plus:
N: digit
Seq:
T: .
Plus:
N: digit
Opt: skip
Seq:
Choice:
T: e
T: E
Choice: 1
T: +
S:
T: -
Plus:
N: digit
</pre>
<dt id="dimension-token-diagram"><<dimension-token>>
<dd>
<pre class='railroad'>
N: <number-token>
N: <ident-token>
</pre>
<dt id="percentage-token-diagram"><<percentage-token>>
<dd>
<pre class='railroad'>
N: <number-token>
T: %
</pre>
<dt id="include-match-token-diagram"><<include-match-token>>
<dd>
<pre class='railroad'>
T: ~=
</pre>
<dt id="dash-match-token-diagram"><<dash-match-token>>
<dd>
<pre class='railroad'>
T: |=
</pre>
<dt id="prefix-match-token-diagram"><<prefix-match-token>>
<dd>
<pre class='railroad'>
T: ^=
</pre>
<dt id="suffix-match-token-diagram"><<suffix-match-token>>
<dd>
<pre class='railroad'>
T: $=
</pre>
<dt id="substring-match-token-diagram"><<substring-match-token>>
<dd>
<pre class='railroad'>
T: *=
</pre>
<dt id="column-token-diagram"><<column-token>>
<dd>
<pre class='railroad'>
T: ||
</pre>
<dt id="CDO-token-diagram"><<CDO-token>>
<dd>
<pre class='railroad'>
T: <!--
</pre>
<dt id="CDC-token-diagram"><<CDC-token>>
<dd>
<pre class='railroad'>
T: -->
</pre>
</dl>
<!--
████████ ████████ ██ ██ ██████
██ ██ ██ ███ ██ ██ ██
██ ██ ██ ████ ██ ██
██ ██ ██████ ██ ██ ██ ██████
██ ██ ██ ██ ████ ██
██ ██ ██ ██ ███ ██ ██
████████ ██ ██ ██ ██████
-->
<h3 id="tokenizer-definitions">
Definitions</h3>
This section defines several terms used during the tokenization phase.
<dl export>
<dt><dfn>code point</dfn>
<dd>
A <a href="http://unicode.org/glossary/#code_point">Unicode code point</a>. [[!UNICODE]]
Any value in the Unicode codespace; that is, the range of integers from 0 to (hexadecimal) 10FFFF.
<dt><dfn>next input code point</dfn>
<dd>
The first <a>code point</a> in the input stream that has not yet been consumed.
<dt><dfn>current input code point</dfn>
<dd>
The last <a>code point</a> to have been consumed.
<dt><dfn>reconsume the current input code point</dfn>
<dd>
Push the <a>current input code point</a> back onto the front of the input stream,
so that the next time you are instructed to consume the <a>next input code point</a>,
it will instead reconsume the <a>current input code point</a>.
<dt><dfn>EOF code point</dfn>
<dd>
A conceptual <a>code point</a> representing the end of the input stream.
Whenever the input stream is empty,
the <a>next input code point</a> is always an EOF code point.
<dt><dfn export>digit</dfn>
<dd>
A <a>code point</a> between U+0030 DIGIT ZERO (0) and U+0039 DIGIT NINE (9).
<dt><dfn export>hex digit</dfn>
<dd>
A <a>digit</a>,
or a <a>code point</a> between U+0041 LATIN CAPITAL LETTER A (A) and U+0046 LATIN CAPITAL LETTER F (F),
or a <a>code point</a> between U+0061 LATIN SMALL LETTER A (a) and U+0066 LATIN SMALL LETTER F (f).
<dt><dfn export>uppercase letter</dfn>
<dd>
A <a>code point</a> between U+0041 LATIN CAPITAL LETTER A (A) and U+005A LATIN CAPITAL LETTER Z (Z).
<dt><dfn export>lowercase letter</dfn>
<dd>
A <a>code point</a> between U+0061 LATIN SMALL LETTER A (a) and U+007A LATIN SMALL LETTER Z (z).
<dt><dfn export>letter</dfn>
<dd>
An <a>uppercase letter</a>
or a <a>lowercase letter</a>.
<dt><dfn export>non-ASCII code point</dfn>
<dd>
A <a>code point</a> with a value equal to or greater than U+0080 &lt;control>.
<dt><dfn export>name-start code point</dfn>
<dd>
A <a>letter</a>,
a <a>non-ASCII code point</a>,
or U+005F LOW LINE (_).
<dt><dfn export>name code point</dfn>
<dd>
A <a>name-start code point</a>,
a <a>digit</a>,
or U+002D HYPHEN-MINUS (-).
<dt><dfn export>non-printable code point</dfn>
<dd>
A <a>code point</a> between U+0000 NULL and U+0008 BACKSPACE,
or U+000B LINE TABULATION,
or a <a>code point</a> between U+000E SHIFT OUT and U+001F INFORMATION SEPARATOR ONE,
or U+007F DELETE.
<dt><dfn export>newline</dfn>
<dd>
U+000A LINE FEED.
<span class='note'>
Note that U+000D CARRIAGE RETURN and U+000C FORM FEED are not included in this definition,
as they are converted to U+000A LINE FEED during <a href="#input-preprocessing">preprocessing</a>.
</span>
<dt><dfn export>whitespace</dfn>
<dd>A <a>newline</a>, U+0009 CHARACTER TABULATION, or U+0020 SPACE.
<dt><dfn export>surrogate code point</dfn>
<dd>
A <a>code point</a> between U+D800 and U+DFFF inclusive.
<dt><dfn export>maximum allowed code point</dfn>
<dd>The greatest <a>code point</a> defined by Unicode: U+10FFFF.
<dt><dfn export>identifier</dfn>
<dd>
A portion of the CSS source that has the same syntax as an <<ident-token>>.
Also appears in <<at-keyword-token>>,
<<function-token>>,
<<hash-token>> with the "id" type flag,
and the unit of <<dimension-token>>.
</dl>
<!--
████████ ███████ ██ ██ ████████ ██ ██ ████ ████████ ████████ ████████
██ ██ ██ ██ ██ ██ ███ ██ ██ ██ ██ ██ ██
██ ██ ██ ██ ██ ██ ████ ██ ██ ██ ██ ██ ██
██ ██ ██ █████ ██████ ██ ██ ██ ██ ██ ██████ ████████
██ ██ ██ ██ ██ ██ ██ ████ ██ ██ ██ ██ ██
██ ██ ██ ██ ██ ██ ██ ███ ██ ██ ██ ██ ██
██ ███████ ██ ██ ████████ ██ ██ ████ ████████ ████████ ██ ██
-->
<h3 id="tokenizer-algorithms">
Tokenizer Algorithms</h3>
The algorithms defined in this section transform a stream of <a>code points</a> into a stream of tokens.
<h4 id="consume-token">
Consume a token</h4>
This section describes how to <dfn>consume a token</dfn> from a stream of <a>code points</a>.
It will return a single token of any type.
<a>Consume comments</a>.
Consume the <a>next input code point</a>.
<dl>
<dt><a>whitespace</a>
<dd>
Consume as much <a>whitespace</a> as possible.
Return a <<whitespace-token>>.
<dt>U+0022 QUOTATION MARK (")
<dd>
<a>Consume a string token</a>
and return it.
<dt>U+0023 NUMBER SIGN (#)
<dd>
If the <a>next input code point</a> is a <a>name code point</a>
or the <a lt="next input code point">next two input code points</a>
<a>are a valid escape</a>,
then:
<ol>
<li>
Create a <<hash-token>>.
<li>
If the <a lt="next input code point">next 3 input code points</a> <a>would start an identifier</a>,
set the <<hash-token>>’s type flag to "id".
<li>
<a>Consume a name</a>,
and set the <<hash-token>>’s value to the returned string.
<li>
Return the <<hash-token>>.
</ol>
Otherwise,
return a <<delim-token>>
with its value set to the <a>current input code point</a>.
<dt>U+0024 DOLLAR SIGN ($)
<dd>
If the <a>next input code point</a> is
U+003D EQUALS SIGN (=),
consume it
and return a <<suffix-match-token>>.
Otherwise,
emit a <<delim-token>>
with its value set to the <a>current input code point</a>.
<dt>U+0027 APOSTROPHE (&apos;)
<dd>
<a>Consume a string token</a>
and return it.
<dt>U+0028 LEFT PARENTHESIS (()
<dd>
Return a <a href="#tokendef-open-paren">&lt;(-token></a>.
<dt>U+0029 RIGHT PARENTHESIS ())
<dd>
Return a <a href="#tokendef-close-paren">&lt;)-token></a>.
<dt>U+002A ASTERISK (*)
<dd>
If the <a>next input code point</a> is
U+003D EQUALS SIGN (=),
consume it
and return a <<substring-match-token>>.
Otherwise,
return a <<delim-token>>
with its value set to the <a>current input code point</a>.
<dt>U+002B PLUS SIGN (+)
<dd>
If the input stream <a>starts with a number</a>,
<a>reconsume the current input code point</a>,
<a>consume a numeric token</a>
and return it.
Otherwise,
return a <<delim-token>>
with its value set to the <a>current input code point</a>.
<dt>U+002C COMMA (,)
<dd>
Return a <<comma-token>>.
<dt>U+002D HYPHEN-MINUS (-)
<dd>
If the input stream <a>starts with a number</a>,
<a>reconsume the current input code point</a>,
<a>consume a numeric token</a>,
and return it.
Otherwise,
if the <a lt="next input code point">next 2 input code points</a> are
U+002D HYPHEN-MINUS
U+003E GREATER-THAN SIGN
(->),
consume them
and return a <<CDC-token>>.
Otherwise,
if the input stream <a>starts with an identifier</a>,
<a>reconsume the current input code point</a>,
<a>consume an ident-like token</a>,
and return it.
Otherwise,
return a <<delim-token>>
with its value set to the <a>current input code point</a>.
<dt>U+002E FULL STOP (.)
<dd>