Skip to content

Commit 7137bc4

Browse files
committed
Fix my handling of unicode-range to clip ranges, and to handle ? characters.
--HG-- extra : rebase_source : 8c3d5aa16ca55aa65112375a705ca806eed72dee
1 parent 00a94e5 commit 7137bc4

1 file changed

Lines changed: 63 additions & 10 deletions

File tree

css3-syntax/parsing.html

Lines changed: 63 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1360,14 +1360,29 @@ <h4>
13601360

13611361
<p>
13621362
Create a new unicode-range token
1363-
with both its start value and end value
1364-
initially set to the empty string.
1363+
with an empty range.
13651364

13661365
<p>
13671366
Consume as many <i>hex digits</i> as possible, but no more than 6.
1368-
Interpret the digits as a hexadecimal number,
1369-
and set the unicode-range token's start value
1370-
to that number.
1367+
If less than 6 <i>hex digits were consumed</i>,
1368+
consume as many U+003F QUESTION MARK (?) character as possible,
1369+
but no more than enough to make the total of <i>hex digits</i> and U+003F QUESTION MARK (?) characters equal to 6.
1370+
1371+
<p>
1372+
If any U+003F QUESTION MARK (?) characters were consumed,
1373+
first interpret the consumed characters as a hexadecimal number,
1374+
with the U+003F QUESTION MARK (?) characters replaced by U+0030 DIGIT ZERO (0) characters.
1375+
This is the <i>start of the range</i>.
1376+
Then interpret the consumed characters as a hexadecimal number again,
1377+
with the U+003F QUESTION MARK (?) character replaced by U+0046 LATIN CAPITAL LETTER F (F) characters.
1378+
This is the <i>end of the range</i>.
1379+
<i>Set the unicode-range token's range</i>, then emit it.
1380+
Switch to the <i>data state</i>.
1381+
1382+
<p>
1383+
Otherwise,
1384+
interpret the digits as a hexadecimal number.
1385+
This is the <i>start of the range</i>.
13711386

13721387
<p>
13731388
Consume the <i>next input character</i>.
@@ -1377,21 +1392,21 @@ <h4>
13771392
<dd>
13781393
If the <i>next input character</i> is a <i>hex digit</i>,
13791394
consume as many <i>hex digits</i> as possible, but no more than 6.
1380-
Interpret the digits as a hexadecimal number,
1381-
and set the unicode-range token's end value to that number.
1382-
Emit the unicode-range token.
1395+
Interpret the digits as a hexadecimal number.
1396+
This is the <i>end of the range</i>.
1397+
<i>Set the unicode-range token's range</i>, then emit it.
13831398
Switch to the <i>data state</i>.
13841399

13851400
<p>
13861401
Otherwise,
1387-
set the unicode-range token's end value to its start value
1402+
<i>set the unicode-range token's range</i>
13881403
and emit it.
13891404
Switch to the <i>data state</i>.
13901405
Reconsume the <i>current input character</i>.
13911406

13921407
<dt>anything else
13931408
<dd>
1394-
Set the unicode-range token's end value to its start value
1409+
<i>Set the unicode-range token's range</i>
13951410
and emit it.
13961411
Switch to the <i>data state</i>.
13971412
Reconsume the <i>current input character</i>.
@@ -1425,3 +1440,41 @@ <h4>
14251440
<dd>
14261441
Return the <i>current input character</i>.
14271442
</dl>
1443+
1444+
<h4>
1445+
<dfn>Set the unicode-range token's range</dfn></h4>
1446+
1447+
<p>
1448+
This section describes how to set a unicode-range token's range
1449+
so that the range it describes
1450+
is within the supported range of unicode characters.
1451+
1452+
<p>
1453+
It assumes that the <dfn>start of the range</dfn> has been defined,
1454+
the <dfn>end of the range</dfn> might be defined,
1455+
and both are non-negative integers.
1456+
1457+
<p>
1458+
If the <i>start of the range</i> is greater than
1459+
the current maximum allowed codepoint in Unicode (currently U+10FFFF),
1460+
the unicode-range token's range is empty.
1461+
1462+
<p>
1463+
If the <i>end of the range</i> is defined,
1464+
and it is less than the <i>start of the range</i>,
1465+
the unicode-range token's range is empty.
1466+
1467+
<p>
1468+
If the <i>end of the range</i> is not defined,
1469+
the unicode-range token's range
1470+
is the single character whose codepoint is the <i>start of the range</i>.
1471+
1472+
<p>
1473+
Otherwise,
1474+
if the <i>end of the range</i> is greater than
1475+
the current maximum allowed codepoint in Unicode,
1476+
change it to the current maximum allowed codepoint.
1477+
The unicode-range token's range
1478+
is all characters between
1479+
the character whose codepoint is the <i>start of the range</i>
1480+
and the character whose codepoint is the <i>end of the range</i>.

0 commit comments

Comments
 (0)