Skip to content

Commit 831c190

Browse files
committed
[css-syntax] Rewrite the <urange> production to be actually correct.
1 parent 0975267 commit 831c190

2 files changed

Lines changed: 185 additions & 169 deletions

File tree

css-syntax/Overview.bs

Lines changed: 91 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -2862,82 +2862,96 @@ The <<urange>> type</h3>
28622862
in terms of existing CSS tokens,
28632863
and how to interpret it as a range of unicode codepoints.
28642864

2865+
Note: The syntax described here is intentionally very low-level,
2866+
and geared toward implementors.
2867+
Authors should instead read the informal syntax description in the previous section,
2868+
as it contains all information necessary to use <<urange>>,
2869+
and is actually readable.
2870+
28652871
The <<urange>> type is defined
28662872
(using the <a href="http://www.w3.org/TR/css3-values/#value-defs">Value Definition Syntax in the Values & Units spec</a>) as:
28672873

28682874
<pre class="prod">
28692875
<<urange>> =
2870-
u '+' <<urange-codepoint>> |
2871-
u '+' <<urange-codepoint>>? '?'+ |
2872-
u '+' <<urange-codepoint>> <<negative-urange-codepoint>> |
2873-
u '+' <<urange-range>>
2876+
u '+' <<ident-token>> '?'* |
2877+
u <<dimension-token>> '?'* |
2878+
u <<number-token>> '?'* |
2879+
u <<number-token>> <<dimension-token>> |
2880+
u <<number-token>> <<number-token>> |
2881+
u '+' '?'+
28742882
</pre>
28752883

2876-
where:
2877-
2878-
: <dfn><<urange-codepoint>></dfn>
2879-
:: One of the following:
2880-
* A <<number-token>> whose type flag is set to "integer" and whose representation is composed solely of <a>digits</a> (no leading "+" or "-" characters).
2881-
* An <<ident-token>> whose representation is solely <a>hex digits</a>.
2882-
* A <<dimension-token>> whose type flag is set to "integer" and whose unit is composed solely of <a>hex digits</a>
2883-
2884-
: <dfn><<negative-urange-codepoint>></dfn>
2885-
:: One of the following:
2886-
* A <<number-token>> whose type flag is set to "integer", and whose value is negative.
2887-
(If the implementation does not support <a href="https://en.wikipedia.org/wiki/Signed_zero">negative zero</a>,
2888-
then when the value is zero,
2889-
the implementation must additionally verify that the representation begins with U+002D HYPHEN-MINUS (-).)
2890-
* An <<ident-token>> whose representation is U+002D HYPHEN-MINUS, followed solely <a>hex digits</a>.
2891-
* A <<dimension-token>> whose type flag is set to "integer",
2892-
whose value is negative,
2893-
and whose unit is composed solely of <a>hex digits</a>.
2894-
(Same caveat as for <<number-token>> applies here.)
2895-
2896-
: <dfn><<urange-range>></dfn>
2897-
:: One of the following:
2898-
* An <<ident-token>> whose representation consists solely of
2899-
one or more <a>hex digits</a>,
2900-
followed by U+002D HYPHEN-MINUS (-),
2901-
followed by one or more <a>hex digits</a>.
2902-
* A <<dimension-token>> whose type flag is set to "integer",
2903-
and whose unit consists solely of
2904-
one or more <a>hex digits</a>,
2905-
followed by U+002D HYPHEN-MINUS (-),
2906-
followed by one or more <a>hex digits</a>.
2907-
29082884
In this production,
29092885
no whitespace can occur between any of the tokens.
29102886

29112887
The <<urange>> production represents a range of one or more contiguous unicode code points
29122888
as a <var>start value</var> and an <var>end value</var>,
29132889
which are non-negative integers.
2914-
Each clause of the above grammar is interpreted as follows:
2915-
2916-
: <code>u '+' <<urange-codepoint>></code>
2917-
::
2918-
<a title="interpret a token as a hex integer">Interpret the &lt;urange-codepoint> as a hex integer</a>.
2919-
Let <var>start value</var> and <var>end value</var> both be the returned value.
2920-
: <code>u '+' <<urange-codepoint>>? '?'+</code>
2921-
::
2922-
If the <<urange-codepoint>> is specified,
2923-
let <var>B</var> be the result of <a title="interpret a token as a hex integer">interpreting the &lt;urange-codepoint> as a hex integer</a>;
2924-
otherwise, let <var>B</var> be zero.
2925-
Let <var>S</var> be the number of contiguous U+003F QUESTION MARK (?) codepoints.
2926-
2927-
Let <var>start value</var> be <code style="white-space: nowrap">B✕(16<sup>S</sup>)</code>.
2928-
Let <var>end value</var> be <code style="white-space: nowrap">B✕(16<sup>S</sup>) + 16<sup>S+1</sup> − 1</code>.
2929-
: <code>u '+' <<urange-codepoint>> <<negative-urange-codepoint>></code>
2930-
::
2931-
Let <var>start value</var> be the result of <a title="interpret a token as a hex integer">interpreting the &lt;urange-codepoint> as a hex integer</a>.
2932-
Let <var>end value</var> be the negation of the result of <a title="interpret a token as a hex integer">interpreting the &lt;negative-urange-codepoint> as a hex integer</a>.
2933-
: <code>u '+' <<urange-range>></code>
2934-
::
2935-
Split the <<urange-range>>’s representation (or representation + unit)
2936-
into the two halves on either side of the U+002D HYPHEN-MINUS (-) codepoint.
2937-
Interpret each half as a hexadecimal integer.
2938-
2939-
Let <var>start value</var> be the first half's value.
2940-
Let <var>end value</var> be the second half's value.
2890+
To interpret the production above into a range,
2891+
execute the following steps in order:
2892+
2893+
1. Skipping the first ''u'' token,
2894+
concatenate the representations of all the tokens in the production together
2895+
(or, in the case of <<dimension-token>>s,
2896+
the representation followed by the unit).
2897+
Let this be <var>text</var>.
2898+
2899+
2. If the first character of <var>text</var> is U+002B PLUS SIGN,
2900+
consume it.
2901+
Otherwise,
2902+
this is an invalid <<urange>>,
2903+
and this algorithm must exit.
2904+
2905+
3. Consume as many <a>hex digits</a> from <var>text</var> as possible.
2906+
then consume as many U+003F QUESTION MARK (?) <a>code points</a> as possible.
2907+
If zero <a>code points</a> were consumed,
2908+
or more than six <a>code points</a> were consumed,
2909+
this is an invalid <<urange>>,
2910+
and this algorithm must exit.
2911+
2912+
If any U+003F QUESTION MARK (?) <a>code points</a> were consumed, then:
2913+
2914+
1. If there are any <a>code points</a> left in <var>text</var>,
2915+
this is an invalid <<urange>>,
2916+
and this algorithm must exit.
2917+
2918+
2. Interpret the consumed <a>code points</a> as a hexadecimal number,
2919+
with the U+003F QUESTION MARK (?) <a>code points</a>
2920+
replaced by U+0030 DIGIT ZERO (0) <a>code points</a>.
2921+
This is the <var>start value</var>.
2922+
2923+
3. Interpret the consumed <a>code points</a> as a hexadecimal number again,
2924+
with the U+003F QUESTION MARK (?) <a>code points</a>
2925+
replaced by U+0046 LATIN CAPITAL LETTER F (F) <a>code points</a>.
2926+
This is the <var>end value</var>.
2927+
2928+
4. Exit this algorithm.
2929+
2930+
Otherwise, interpret the consumed <a>code points</a> as a hexadecimal number.
2931+
This is the <var>start value</var>.
2932+
2933+
4. If there are no <a>code points</a> left in <var>text</var>,
2934+
The <var>end value</var> is the same as the <var>start value</var>.
2935+
Exit this algorithm.
2936+
2937+
5. If the next <a>code point</a> in <var>text</var> is U+002D HYPHEN-MINUS (-),
2938+
consume it.
2939+
Otherwise,
2940+
this is an invalid <<urange>>,
2941+
and this algorithm must exit.
2942+
2943+
6. Consume as many <a>hex digits</a> as possible from <var>text</var>.
2944+
2945+
If zero <a>hex digits</a> were consumed,
2946+
or more than 6 <a>hex digits</a> were consumed,
2947+
this is an invalid <<urange>>,
2948+
and this algorithm must exit.
2949+
If there are any <a>code points</a> left in <var>text</var>,
2950+
this is an invalid <<urange>>,
2951+
and this algorithm must exit.
2952+
2953+
7. Interpret the consumed <a>code points</a> as a hexadecimal number.
2954+
This is the <var>end value</var>.
29412955

29422956
To determine what codepoints the <<urange>> represents:
29432957

@@ -2949,26 +2963,20 @@ The <<urange>> type</h3>
29492963

29502964
3. Otherwise, the <<urange>> represents a contiguous range of codepoints from <var>start value</var> to <var>end value</var>, inclusive.
29512965

2952-
To <dfn>interpret a token as a hex integer</dfn>:
2953-
2954-
* If the token is a <<number-token>> whose type flag is set to "integer",
2955-
interpret its representation as a hexadecimal number
2956-
and return the resulting value.
2957-
2958-
* If the token is an <<ident-token>> consisting solely of <a>hex digits</a>,
2959-
interpret its representation as a hexadecimal number
2960-
and return the resulting value.
2961-
2962-
* If the token is a <<dimension-token>> whose type flag is set to "integer"
2963-
and whose unit consists solely of <a>hex digits</a>,
2964-
concatenate its representation and its unit,
2965-
then interpret that as a hexadecimal number
2966-
and return the resulting value.
2967-
2968-
* Otherwise, the token can't be interpreted as a hex integer.
2969-
It is a specification error to reach this clause.
2970-
2971-
2966+
Note: The syntax of <<urange>> is intentionally fairly wide;
2967+
its patterns capture every possible token sequence
2968+
that the informal syntax can generate.
2969+
However, it requires no whitespace between its constituent tokens,
2970+
which renders it fairly safe to use in practice.
2971+
Even grammars which have a <<urange>> followed by a <<number>> or <<dimension>>
2972+
(which might appear to be ambiguous
2973+
if an author specifies the <<urange>> with the ''u <<number>>'' clause)
2974+
are actually quite safe,
2975+
as an author would have to intentionally separate the <<urange>> and the <<number>>/<<dimension>>
2976+
with a comment rather than whitespace
2977+
for it to be ambiguous.
2978+
Thus, while it's <em>possible</em> for authors to write things that are parsed in confusing ways,
2979+
the actual code they'd have to write to cause the confusion is, itself, confusing and rare.
29722980

29732981

29742982
<!--

0 commit comments

Comments
 (0)