-
Notifications
You must be signed in to change notification settings - Fork 715
[css-fonts-5] Make unicode-range
syntax suck less
#7921
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Ideas and related work:
For this, I'd propose allowing unicode-range: "&";
For this there has already been discussion in #4573 and just needs spec edits.
Perhaps we need some kind of not operator. This can even be just a keyword (
Which would allow things like: unicode-range: greek, not japanese, not U+A5; Or a minus operator and a keyword for all characters? unicode-range: greek except "π"; |
Yeah, I think |
Presumably if there are no positive ranges, then the starting point would be all characters rather than none.
Were you expecting that |
I was just trying to show syntax, but I agree that is a poor example. They definitely don't intersect! The fact that I can't easily think of examples that do intersect probably proves that @tabatkins is right and we don't need to subtract from a particular range. |
You might want to start by looking at what Unicode and ICU have done in this space. For example, the UnicodeSet class in ICU4J is similar to the kinds of "range selection" you're describing here--one can add characters according to various Unicode properties, classes, and scripts to build up ranges, invert ranges, etc. I think the descriptions in the thread above need to be tighter. Are I also just has a look at the text mentioned in #4573 located here. Have we (I18N) reviewed this yet?? I think (if I had been the reviewer) I would have proposed issues against the variable width |
It may also be useful (i haven't thought it through completely) to allow ranges separated by hyphens, like:
which would include the characters &¡¢£¤¥¦§©. You'd need a way of escaping the - character though. (Two other cautions about situations where it may be better to stick with code point numbers: [1] Using characters instead of code point values may cause some difficulty when specifying RTL character sets. For example in unicode-range: "ذ-خ", "ى", "a-z", "ب-ت"; the underlying order is not what you see (although it could be worse). [2] You'll probably still want to use code point values for combining characters and invisible characters, and especially for formatting characters such as RLI/LRI etc which will again make the declaration look odd and hard to edit. |
Right, those issues are precisely why I don't think we want to allow string-based ranges, at least not with that syntax. A range(start, end) function could potentially work, if needed. (Tho since all the syntactic characters inside the parens are non-directional it still ends up being very confusingly visibly reordered if viewed in a web-based editor.) |
I think ranges are useful but obviously the token that indicates this is a range would need to be outside the string. Eg a function like @tabatkins described or even |
Not necessarily. On the (probably rare) occasion where - has to be specified as a character it could be escaped (like in regex expressions). In fact, this whole thing sounds very like establishing a regex expression, so perhaps that offers an alternative approach to the syntax? That would also allow mixing of characters and code point values, eg. if a range you specify starts with a visible character but ends with an invisible one. |
I think that makes it harder to read what the range is. I love regex, but it's not exactly known for its readability 😀
Not sure I follow. If anything it seems to me that doing ranges with syntax outside the string makes this easier. |
That descriptor (and most of the spec text) is from CSS2, in 1998 by the way. |
Yeah, if we'd designed it today it would have sucked a whole lot less. That syntax can drink; that syntax has graduated college; that syntax can rent a car without an additional surcharge. |
Playing with it a bit myself, unfortunately I think we'd be well-served by using a separator token with strong LTR directionality like If you're trying to denote a range from U+062E (خ) to U+0630 (ذ), you get the following results with a weak directionality vs strong directionality separator: range("خ" to "ذ") range("خ", "ذ") The above two strings are exactly identical save for the separator used, but the bidi algorithm makes the second look like it's in the wrong order. |
My abject apologies, once again for the unicode-range syntax. "Put it in for now, Chris, until we come up with something better" -- Håkon Wium Lie, spring 1997 On the other hand, at least it wasn't the worst syntax proposed. Feast your eyes on the hex-encoded BMP bitmap:
|
I'll remind Martin and Misha that they missed one. 😆 |
Well hey, turns out it's had a pretty good run, now let's all just come up with something better! 😁 |
Uh oh!
There was an error while loading. Please reload this page.
Right now
unicode-range
accepts everything in terms of codepoints. For example:This has several problems:
U+0000
and ending atU+FFFF
breaking as needed in between.The text was updated successfully, but these errors were encountered: