Skip to content

Commit f1792dd

Browse files
committed
[css-syntax-3] Per WG resolution, restrict non-ASCII ident code points to the same list that HTML allows in custom element names. #7129
1 parent e680c3e commit f1792dd

File tree

1 file changed

+62
-10
lines changed

1 file changed

+62
-10
lines changed

css-syntax-3/Overview.bs

Lines changed: 62 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -786,14 +786,69 @@ Definitions</h3>
786786
An <a>uppercase letter</a>
787787
or a <a>lowercase letter</a>.
788788

789-
<dt><dfn export>non-ASCII code point</dfn>
790-
<dd>
791-
A <a>code point</a> with a value equal to or greater than U+0080 &lt;control>.
789+
<dt><dfn export>non-ASCII ident code point</dfn>
790+
<dd>
791+
A <a>code point</a> whose value is any of:
792+
793+
* U+00B7
794+
* between U+00C0 and U+00D6
795+
* between U+00D8 and U+00F6
796+
* between U+00F8 and U+037D
797+
* between U+037F and U+1FFF
798+
* U+200C
799+
* U+200D
800+
* U+203F
801+
* U+2040
802+
* between U+2070 and U+218F
803+
* between U+2C00 and U+2FEF
804+
* between U+3001 and U+D7FF
805+
* between U+F900 and U+FDCF
806+
* between U+FDF0 and U+FFFD
807+
* greater than or equal to U+10000
808+
809+
<details class=note>
810+
<summary>Why these character, specifically?</summary>
811+
812+
This matches the list of non-ASCII codepoints
813+
allowed to be used in HTML [=valid custom element names=].
814+
It excludes a number of characters that appear as whitespace,
815+
or that can cause rendering or parsing issues in some tools,
816+
such as the direction override codepoints.
817+
818+
Note that this is a weaker set of restrictions
819+
than <a href="https://unicode.org/reports/tr31/#Figure_Code_Point_Categories_for_Identifier_Parsing">UAX 31</a>
820+
recommends for identifiers
821+
(used by languages such as JavaScript to restrict their identifier syntax),
822+
allowing things such as
823+
starting an identifier with a combining character.
824+
Consistency with HTML custom element names
825+
(and thus, the ability to write selectors for all custom elements
826+
without having to use escapes)
827+
was considered valuable,
828+
and the set of characters restricted by HTML
829+
covers the "high value" restrictions well.
830+
831+
These restrictions do not avoid all possible confusing renderings;
832+
mixing characters from LTR and RTL scripts
833+
can still result in unexpected visual transposition
834+
in most text editors,
835+
for example.
836+
Source text can contain the restricted characters in non-ident contexts, as well:
837+
most of them are completely valid in strings, for example.
838+
Even when used in a way that creates invalid CSS,
839+
the parsing errors they cause might be limited to something unimportant,
840+
while their effect on rendering the source text in code review tools
841+
might be significant and/or malicious.
842+
For more details on these sorts of "source text attacks",
843+
see <a href="https://blog.rust-lang.org/2021/11/01/cve-2021-42574.html">this Rust-lang blog post</a>
844+
<small><a href="https://web.archive.org/web/20220323175009/https://blog.rust-lang.org/2021/11/01/cve-2021-42574.html">(archived)</a></small>.
845+
</details>
846+
792847

793848
<dt><dfn export lt="ident-start code point | name-start code point" oldids="name-start-code-point, identifier-start-code-point">ident-start code point</dfn>
794849
<dd>
795850
A <a>letter</a>,
796-
a <a>non-ASCII code point</a>,
851+
a <a>non-ASCII ident code point</a>,
797852
or U+005F LOW LINE (_).
798853

799854
<dt><dfn export lt="ident code point" oldids="name-code-point, identifier-code-point">ident code point</dfn>
@@ -2122,7 +2177,7 @@ Parse A Comma-Separated List According To A CSS Grammar</h4>
21222177
Parse a stylesheet</h4>
21232178

21242179
<div algorithm>
2125-
To <dfn export>parse a stylesheet</dfn> from an |input|
2180+
To <dfn export>parse a stylesheet</dfn> from an |input|
21262181
given an optional [=/url=] |location|:
21272182

21282183
<ol>
@@ -3923,11 +3978,8 @@ Changes from CSS 2.1 and Selectors Level 3</h3>
39233978
-->
39243979

39253980
<li>
3926-
The definition of <a>non-ASCII code point</a> was changed
3927-
to be consistent with every definition of ASCII.
3928-
This affects <a>code points</a> U+0080 to U+009F,
3929-
which are now <a>ident code points</a> rather than <<delim-token>>s,
3930-
like the rest of <a>non-ASCII code points</a>.
3981+
The definition of <a>non-ASCII ident code point</a> was changed
3982+
to be consistent with HTML's [=valid custom elements names=].
39313983

39323984
<li>
39333985
Tokenization does not emit COMMENT or BAD_COMMENT tokens anymore.

0 commit comments

Comments
 (0)