Fix my handling of unicode-range to clip ranges, and to handle ? characters.

tabatkins · tabatkins · commit 7137bc443e99 · 2012-04-12T08:50:42.000-07:00
--HG--
extra : rebase_source : 8c3d5aa16ca55aa65112375a705ca806eed72dee
diff --git a/css3-syntax/parsing.html b/css3-syntax/parsing.html
@@ -1360,14 +1360,29 @@ <h4>
 
 	<p>
 		Create a new unicode-range token
-		with both its start value and end value
-		initially set to the empty string.
+		with an empty range.
 
 	<p>
 		Consume as many <i>hex digits</i> as possible, but no more than 6.
-		Interpret the digits as a hexadecimal number,
-		and set the unicode-range token's start value
-		to that number.
+		If less than 6 <i>hex digits were consumed</i>,
+		consume as many U+003F QUESTION MARK (?) character as possible,
+		but no more than enough to make the total of <i>hex digits</i> and U+003F QUESTION MARK (?) characters equal to 6.
+
+		<p>
+			If any U+003F QUESTION MARK (?) characters were consumed,
+			first interpret the consumed characters as a hexadecimal number,
+			with the U+003F QUESTION MARK (?) characters replaced by U+0030 DIGIT ZERO (0) characters.
+			This is the <i>start of the range</i>.
+			Then interpret the consumed characters as a hexadecimal number again,
+			with the U+003F QUESTION MARK (?) character replaced by U+0046 LATIN CAPITAL LETTER F (F) characters.
+			This is the <i>end of the range</i>.
+			<i>Set the unicode-range token's range</i>, then emit it.
+			Switch to the <i>data state</i>.
+
+		<p>
+			Otherwise,
+			interpret the digits as a hexadecimal number.
+			This is the <i>start of the range</i>.
 
 	<p>
 		Consume the <i>next input character</i>.
@@ -1377,21 +1392,21 @@ <h4>
 		<dd>
 			If the <i>next input character</i> is a <i>hex digit</i>,
 			consume as many <i>hex digits</i> as possible, but no more than 6.
-			Interpret the digits as a hexadecimal number,
-			and set the unicode-range token's end value to that number.
-			Emit the unicode-range token.
+			Interpret the digits as a hexadecimal number.
+			This is the <i>end of the range</i>.
+			<i>Set the unicode-range token's range</i>, then emit it.
 			Switch to the <i>data state</i>.
 
 			<p>
 				Otherwise,
-				set the unicode-range token's end value to its start value
+				<i>set the unicode-range token's range</i>
 				and emit it.
 				Switch to the <i>data state</i>.
 				Reconsume the <i>current input character</i>.
 
 		<dt>anything else
 		<dd>
-			Set the unicode-range token's end value to its start value
+			<i>Set the unicode-range token's range</i>
 			and emit it.
 			Switch to the <i>data state</i>.
 			Reconsume the <i>current input character</i>.
@@ -1425,3 +1440,41 @@ <h4>
 		<dd>
 			Return the <i>current input character</i>.
 	</dl>
+
+<h4>
+<dfn>Set the unicode-range token's range</dfn></h4>
+
+	<p>
+		This section describes how to set a unicode-range token's range
+		so that the range it describes 
+		is within the supported range of unicode characters.
+
+	<p>
+		It assumes that the <dfn>start of the range</dfn> has been defined,
+		the <dfn>end of the range</dfn> might be defined,
+		and both are non-negative integers.
+
+	<p>
+		If the <i>start of the range</i> is greater than
+		the current maximum allowed codepoint in Unicode (currently U+10FFFF),
+		the unicode-range token's range is empty.
+
+	<p>
+		If the <i>end of the range</i> is defined,
+		and it is less than the <i>start of the range</i>,
+		the unicode-range token's range is empty.
+
+	<p>
+		If the <i>end of the range</i> is not defined,
+		the unicode-range token's range
+		is the single character whose codepoint is the <i>start of the range</i>.
+
+	<p>
+		Otherwise,
+		if the <i>end of the range</i> is greater than
+		the current maximum allowed codepoint in Unicode,
+		change it to the current maximum allowed codepoint.
+		The unicode-range token's range
+		is all characters between
+		the character whose codepoint is the <i>start of the range</i>
+		and the character whose codepoint is the <i>end of the range</i>.