-
Notifications
You must be signed in to change notification settings - Fork 708
[css-syntax] custom property names too permissive, require namespacing rules #7129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, custom property names can contain literally any non-ASCII character. If necessary, I'm happy to restrict this, but I question why it would be necessary to do this for property names, but okay for property values to still be fully unrestricted? |
Property names are used in CSS "code" and have to be be parsed, matched, and otherwise referenced. Abusive names can cause spoofing problems (even though the underlying code point sequence is still just some integers and the parser may not care). For example, is Property values are data and can include natural language text (as well as, well, any character data, including junk). While the value space might be limited by applications in different ways, there don't appear to be any requirements to do so here. In fact, your Spec goes out of its way to highlight this fact:
|
Property values are used in CSS just as much as property names, tho. We can't interpret custom property values in the custom property, but we do as soon as they're substituted into a non-custom property, or if the custom property is registered with a grammar. Moreso, tho, the |
Yes, it occurred to me that this might turn out to be a gap in CSS Syntax (which might be a serious "ouch" and difficult to do something about). Since one of the things the property value can contain is a string literal, one probably can't apply UAX31-like rules just generically to the value (i.e. in CSS Variables) |
Right, removing the potential from strings def seems out (at minimum, they def shouldn't be restricted to the Identifier production from Unicode), but I think I didn't make my original point clearly enough - this restriction should apply to all custom identifiers, not just property names, right? |
That's right. |
Okay, I'm gonna retag this to Syntax, then, because we should handle the restriction at that level. |
Agenda+ to discuss restricting the allowed codepoints in an ident sequence (used in keywords, function names, dimension units, selectors, property names, etc). Possible options:
Then there are subsequent questions. First, what should we do about characters so restricted that are used in an identifier sequence?
Second, should we allow escapes to represent the restricted character in identifiers?
|
Since this is now being solved at the CSS Syntax level, which is the correct way to do it, untagging from CSS Variables |
Note that the latter two of the three identifiers in my popular tweet would be invalid under the JS rules:
I'f I'm reading correctly, under the HTML rules the middle one Not that this is required to be supported, just noting the effects. ^_^ |
The CSS Working Group just discussed
The full IRC log of that discussion<fantasai> Topic: Custom property names too permissive<fantasai> github: https://github.com//issues/7129 <fantasai> TabAtkins: i18nWG raised issue about custom idents, which allow any Unicode codepoint above a certain codepoint <fantasai> TabAtkins: There are some concerns about e.g. bidi characters corrupting the display of the code <fantasai> TabAtkins: Also argument for consistency in what characters allowed across languages <fantasai> TabAtkins: JS follows UAX?? rules for characters allowed in idents <fantasai> TabAtkins: HTML allows a different but largely compatible range of characters <fantasai> TabAtkins: In one of my Tweets, I showed off using weird Unicode rules <fantasai> TabAtkins: e.g. different emoji are valid or invalid <fantasai> TabAtkins: I agree with i18n feedback, reasonable to partially restrict these <fantasai> TabAtkins: e.g. no reason to allow bidi override chars in CSS idents <fantasai> TabAtkins: so I suggest adopting either HTML rules or JS rules <Rossen_> q? <fantasai> TabAtkins: don't have a strong opinion on which to go for <fantasai> TabAtkins: Otherwise I'd go with HTML rules by default <emilio> Scribenick: emilio <emilio> fantasai: I think this is fairly reasonable, but I don't know the differences between the rules so I don't have an opinion on those yet <fantasai> TabAtkins: JS rules are a bit more strict, they disallow chars that look like punctuation <fantasai> TabAtkins: HTML gives exact codepoint ranges <fantasai> TabAtkins: Reason I'd go with HTML is to guarantee being able to write selectors for custom elements, without ever having to escape <Rossen_> ack fantasai <fantasai> fantasai: That sounds reasonable, let's go with that <fantasai> Rossen_: Makes sense, any downsides to it? <fantasai> TabAtkins: Any change to make more restrictive, could potentially make some stylesheets invalid <fantasai> TabAtkins: potentially breaking code that works <fantasai> Rossen_: And with HTML rules we'd have fewer breakage <fantasai> Rossen_: seems like path of least destruction <fantasai> Rossen_: Anyone would like to argue against the change entirely? <fantasai> Rossen_: If not any objections? <fantasai> Rossen_: Taking the silence as a no <fantasai> RESOLVED: Use HTML restrictions for custom idents <fantasai> TabAtkins: Got 2 sub-issues <fantasai> TabAtkins: One is whether to allow illegal characters to be escaped in an identifier <fantasai> TabAtkins: JS doesn't allow that, you can escape for readability but not to avoid the identifier restrictions <fantasai> TabAtkins: but CSS has traditionally always allowed escapes for everything, so don't see a strong reason to disallow <faceless> +1 from us too <fantasai> TabAtkins: So I would prefer to go with illegal chars can be escaped <fantasai> fantasai: I strongly agree with that <fantasai> Rossen_: Any objections for allowing illegal characters to be escaped in an ident? <fantasai> RESOLVED: illegal characters in an ident can be escaped <fantasai> TabAtkins: Next question is how do we handle the illegal characters <dbaron> That doesn't allow nulls in idents, does it? <fantasai> TabAtkins: Do we censor them into e.g. U+FFFD <fantasai> TabAtkins: or drop them entirely? <fantasai> TabAtkins: I'd prefer to drop them, because it would more clearly result in invalid code <fantasai> TabAtkins: so if we allow to work but censored it wouldn't prevent use in source text, which was the goal of i18n <fantasai> TabAtkins: so would prefer to exclude from the ident production <fantasai> <fantasai> +1 <tantek> +1 TabAtkins <fantasai> Rossen_: [missed] <fantasai> TabAtkins: No, would not be changing existing rules for censoring rules. Currently lone surrogates etc. do that <fantasai> TabAtkins: Those are in there for UTF-8 well-formedness and C compatibility <fantasai> TabAtkins: They have a reason to be censored out at technical low level <fantasai> TabAtkins: these restrictions are for human reasons, so would restrict differently <Rossen_> ack fantasai <fantasai> fantasai: So should we resolve that they would make the production invalid? (That's what was proposed right?) <TabAtkins> --(╯°□°)╯ <fantasai> TabAtkins: yes <fantasai> TabAtkins: if you put this ^ as a custom property name, the degree sign is not a valid character <fantasai> TabAtkins: so it would make an ident, a delim, a parenthesis, and a ??? <fantasai> TabAtkins: That's definitely not an ident, because it's multiple tokens not an ident token <bmathwig> Is there a practical use case for doing something like that? Seems more like a developer having fun rather than good quality code. <fantasai> TabAtkins: Proposed resolution is that it would break into multiple tokens <fantasai> fantasai: What kind of token are these invalid characters going to be? <fantasai> TabAtkins: DELIMs, one codepoint at a time <fantasai> TabAtkins: Characters without a specific role are generally handled as DELIM <fantasai> TabAtkins: and we only use certain DELIMs in certain places <TabAtkins> the degree sign isn't a valid ident char under the HTML rules, so this would produce an ident, a delim containing the degree sign, an ident, a delim, and finally an ident <fantasai> RESOLVED: Invalid ident characters are treated as DELIM tokens <faceless> present- |
The HTML allowed chars are:
We'd continue to disallow |
@aphillips does that resolution address your concerns? |
Programming languages such as JS and Java that allow non-ASCII variable names with character limits usually have different restrictions for the initial character. Most notably they forbid combining marks. They sometimes exclude other values (such as bidi controls, although those are excluded above). I think using the HTML restrictions is a realistic solution for CSS for the reasons given above. We might need a note about combining mark handling at the start of a token. HTML handles this by requiring an alpha char ( |
Ah, hm, indeed. HTML gets to avoid the first-character problem. CSS does have special rules for the first character of an ident as well, but they're only different in the ASCII range (preventing idents from being number-lookalikes). But if the concern is just about combining characters at the start, that'll be fine mechanically; the CSS parser still just handles codepoints individually and doesn't care about combining in any way. So you could select the class you mention without worry, by putting that combining char after a period. That said, I'm fine with further first-char restrictions if necessary. Unless you request otherwise tho, @aphillips , I'll assume that the existing rules should be fine and use the same non-ASCII allowed characters for both initial and non-initial chars. |
As long as the CSS parser doesn't care, then things should work. Content authors will need to be advised about the spoof/abuse potential when viewing CSS files as text somewhere (you may even already have such a note??) |
Don't have such a note, but I'm look over some of the verbage used elsewhere for that issue and add one. |
…s to the same list that HTML allows in custom element names. #7129
Okay, restriction applied, and I added a significant note to that section. I'll need to add and/or tweak some tests for this. |
CC’ing @markusicu @macchiati (Unicode “Trojan Source” working group) as FYI |
Note that I'm currently linking to a Rust-lang blog post about the trojan source thing; if there's a better "official source" about it from Unicode I'd be happy to switch the reference over. |
Here's Unicode's announcement, which has a link to @macchiati et al's doc about the topic. |
Shouldn't serialize an identifier be modified accordingly? - If the character is not handled by one of the above rules and is greater than or equal to U+0080, is "-" (U+002D) or "_" (U+005F), or is in one of the ranges [0-9] (U+0030 to U+0039), [A-Z] (U+0041 to U+005A), or \[a-z] (U+0061 to U+007A), then the character itself.
+ If the character is not handled by one of the above rules and is a [non-ASCII ident code point](https://drafts.csswg.org/css-syntax-3/#non-ascii-ident-code-point), "-" (U+002D), "_" (U+005F), or is in one of the ranges [0-9] (U+0030 to U+0039), [A-Z] (U+0041 to U+005A), or \[a-z] (U+0061 to U+007A), then the character itself. |
Yes, it should be. |
Is it correct that the ranges are inclusive?
Can we change the current text to :
Update : I forgot I asked this here and asked it again in #8861 (comment) This has been answered and needed edits were made. |
Originally raised on CSS Variables, but later discussion concluded the best fix is to change CSS Syntax. Original post below:
https://www.w3.org/TR/css-variables-1/#defining-variables
The above text defines the custom property name as "any valid identifier". Tracing that definition back to CSS Values and thence to
ident token
, we find that the name can contain any Unicode code point > U+0080. This includes various special forms of whitespace as well as potential problem characters, such as bidi controls (such as might cause "Trojan Source" attacks). Namespacing is definitely a complicated problem: the I18N WG doesn't want groups to cherry-pick characters (thereby excluding certain languages from using the feature).Most programming languages attempt to address this by adopting some form of restriction for variable names such as those found in UAX31 Unicode Identifier and Pattern Syntax. In JavaScript, for example, the definition looks like the one found here. CSS should make similar restrictions on property names (values can remain unrestricted).
The text was updated successfully, but these errors were encountered: