@@ -786,14 +786,69 @@ Definitions</h3>
786786 An <a>uppercase letter</a>
787787 or a <a>lowercase letter</a> .
788788
789- <dt> <dfn export>non-ASCII code point</dfn>
790- <dd>
791- A <a>code point</a> with a value equal to or greater than U+0080 <control>.
789+ <dt> <dfn export>non-ASCII ident code point</dfn>
790+ <dd>
791+ A <a>code point</a> whose value is any of:
792+
793+ * U+00B7
794+ * between U+00C0 and U+00D6
795+ * between U+00D8 and U+00F6
796+ * between U+00F8 and U+037D
797+ * between U+037F and U+1FFF
798+ * U+200C
799+ * U+200D
800+ * U+203F
801+ * U+2040
802+ * between U+2070 and U+218F
803+ * between U+2C00 and U+2FEF
804+ * between U+3001 and U+D7FF
805+ * between U+F900 and U+FDCF
806+ * between U+FDF0 and U+FFFD
807+ * greater than or equal to U+10000
808+
809+ <details class=note>
810+ <summary> Why these character, specifically?</summary>
811+
812+ This matches the list of non-ASCII codepoints
813+ allowed to be used in HTML [=valid custom element names=] .
814+ It excludes a number of characters that appear as whitespace,
815+ or that can cause rendering or parsing issues in some tools,
816+ such as the direction override codepoints.
817+
818+ Note that this is a weaker set of restrictions
819+ than <a href="https://unicode.org/reports/tr31/#Figure_Code_Point_Categories_for_Identifier_Parsing">UAX 31</a>
820+ recommends for identifiers
821+ (used by languages such as JavaScript to restrict their identifier syntax),
822+ allowing things such as
823+ starting an identifier with a combining character.
824+ Consistency with HTML custom element names
825+ (and thus, the ability to write selectors for all custom elements
826+ without having to use escapes)
827+ was considered valuable,
828+ and the set of characters restricted by HTML
829+ covers the "high value" restrictions well.
830+
831+ These restrictions do not avoid all possible confusing renderings;
832+ mixing characters from LTR and RTL scripts
833+ can still result in unexpected visual transposition
834+ in most text editors,
835+ for example.
836+ Source text can contain the restricted characters in non-ident contexts, as well:
837+ most of them are completely valid in strings, for example.
838+ Even when used in a way that creates invalid CSS,
839+ the parsing errors they cause might be limited to something unimportant,
840+ while their effect on rendering the source text in code review tools
841+ might be significant and/or malicious.
842+ For more details on these sorts of "source text attacks",
843+ see <a href="https://blog.rust-lang.org/2021/11/01/cve-2021-42574.html">this Rust-lang blog post</a>
844+ <small> <a href="https://web.archive.org/web/20220323175009/https://blog.rust-lang.org/2021/11/01/cve-2021-42574.html">(archived)</a> </small> .
845+ </details>
846+
792847
793848 <dt> <dfn export lt="ident-start code point | name-start code point" oldids="name-start-code-point, identifier-start-code-point">ident-start code point</dfn>
794849 <dd>
795850 A <a>letter</a> ,
796- a <a>non-ASCII code point</a> ,
851+ a <a>non-ASCII ident code point</a> ,
797852 or U+005F LOW LINE (_).
798853
799854 <dt> <dfn export lt="ident code point" oldids="name-code-point, identifier-code-point">ident code point</dfn>
@@ -2122,7 +2177,7 @@ Parse A Comma-Separated List According To A CSS Grammar</h4>
21222177Parse a stylesheet</h4>
21232178
21242179 <div algorithm>
2125- To <dfn export>parse a stylesheet</dfn> from an |input|
2180+ To <dfn export>parse a stylesheet</dfn> from an |input|
21262181 given an optional [=/url=] |location|:
21272182
21282183 <ol>
@@ -3923,11 +3978,8 @@ Changes from CSS 2.1 and Selectors Level 3</h3>
39233978 -->
39243979
39253980 <li>
3926- The definition of <a>non-ASCII code point</a> was changed
3927- to be consistent with every definition of ASCII.
3928- This affects <a>code points</a> U+0080 to U+009F,
3929- which are now <a>ident code points</a> rather than <<delim-token>> s,
3930- like the rest of <a>non-ASCII code points</a> .
3981+ The definition of <a>non-ASCII ident code point</a> was changed
3982+ to be consistent with HTML's [=valid custom elements names=] .
39313983
39323984 <li>
39333985 Tokenization does not emit COMMENT or BAD_COMMENT tokens anymore.
0 commit comments