-
Notifications
You must be signed in to change notification settings - Fork 714
[selectors-4] Clarify :lang() behavior when the language range is not a well-formed BCP 47 code #8720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Right, this is undefined in the spec right now. If the |
@tabatkins what if the |
That tag cannot be parsed according to BCP47 syntax, and therefore we have no basis to assume anything about any part of it, such as the meaning of the Therefore, IMO it should not match any |
Twenty years ago, I briefly argued that implementations should be free to use external knowledge about the data format in order map random language information to BCP-47. https://lists.w3.org/Archives/Public/www-style/2003Oct/0234.html |
Firefox 114+ no longer match on backslashes in `:lang()`, even when escaped. It is an intentional change as `:lang()` parameters are supposed to be valid BCP 47 strings. Therefore, we won't attempt to patch it. We'll keep this test here until other browsers match the behavior. Fixes jquerygh-5271 Ref https://bugzilla.mozilla.org/show_bug.cgi?id=1839747#c1 Ref w3c/csswg-drafts#8720 (comment)
Firefox 114+ no longer match on backslashes in `:lang()`, even when escaped. It is an intentional change as `:lang()` parameters are supposed to be valid BCP 47 strings. Therefore, we won't attempt to patch it. We'll keep this test here until other browsers match the behavior. Fixes gh-5271 Closes gh-5277 Ref https://bugzilla.mozilla.org/show_bug.cgi?id=1839747#c1 Ref w3c/csswg-drafts#8720 (comment)
Firefox 114+ no longer match on backslashes in `:lang()`, even when escaped. It is an intentional change as `:lang()` parameters are supposed to be valid BCP 47 strings. Therefore, we won't attempt to patch it. We'll keep this test here until other browsers match the behavior. Fixes gh-5271 Closes gh-5277 Ref https://bugzilla.mozilla.org/show_bug.cgi?id=1839747#c1 Ref w3c/csswg-drafts#8720 (comment) (cherry picked from commit 62b9a25)
Firefox 114+ no longer match on backslashes in `:lang()`, even when escaped. It is an intentional change as `:lang()` parameters are supposed to be valid BCP 47 strings. Therefore, we won't attempt to patch it. We'll keep this test here until other browsers match the behavior. Ref jquery/jquery#5271 Ref jquery/jquery#5277 Ref https://bugzilla.mozilla.org/show_bug.cgi?id=1839747#c1 Ref w3c/csswg-drafts#8720 (comment)
According to https://www.w3.org/TR/selectors-4/#the-lang-pseudo,
The text also goes on to mention that
[my emphasis] which implies, as I understand it, that something like
:lang("qq")
will match content tagged withlang="qq"
even thoughqq
is not a valid language tag (as listed in the IANA registry).However, the Selectors spec does not specifically address how ill-formed (not merely invalid) tags should be handled.
According to the language tag syntax given in https://www.rfc-editor.org/rfc/rfc5646#section-2.1, a tag like
åå
(containing non-ASCII characters) would be ill-formed ("the language tags described in this document are sequences of characters from the US-ASCII [ISO646] repertoire"), as would a tag likeen---
(the various subtags following the primary language subtag are optional, but the grammar does not allow for them to be empty; if they're not present, the corresponding hyphen delimiters should also be omitted).So how does
:lang()
matching work in the presence of ill-formed codes? It seems to me that a literal reading of the spec requires that such codes never match, because its definition of "matches" depends on "when represented in BCP 47 syntax", and such ill-formed codes cannot be represented in BCP 47 at all; they conflict with its basic grammar.A possible alternative interpretation might be that the handling of ill-formed codes is simply undefined (because the spec only addresses what it means to "match" for codes "represented in BCP 47 syntax".
I'm not aware of any compelling use case for ill-formed language codes. So in the interests of clarity and interoperability I would like to ask the WG to confirm (and explicitly note in the spec) that
:lang()
matching is based strictly on BCP 47 and RFC4647, and as such, ill-formed codes never match.(Note that the current implementation in WebKit does allow ill-formed tags to match. Thus if content is tagged with
lang="SomeRandomCode-Latn-US"
, which is ill-formed because the primary language subtag is too long, it is nevertheless matched by:lang(SomeRandomCode)
,:lang("*-US")
, etc. I think this should be considered a bug in the implementation.)The text was updated successfully, but these errors were encountered: