-
Notifications
You must be signed in to change notification settings - Fork 791
[selectors-4] Clarify :lang() behavior when the language range is not a well-formed BCP 47 code #8720
Copy link
Copy link
Open
Labels
Closed Accepted by CSSWG ResolutionCommenter SatisfiedCommenter has indicated satisfaction with the resolution / edits.Commenter has indicated satisfaction with the resolution / edits.Needs Testcase (WPT)i18n-trackerGroup bringing to attention of Internationalization, or tracked by i18n but not needing response.Group bringing to attention of Internationalization, or tracked by i18n but not needing response.selectors-4Current WorkCurrent Workspec-test-mismatchIssues about tests that seem to assume things that contradict the spec or are not specified at allIssues about tests that seem to assume things that contradict the spec or are not specified at all
Metadata
Metadata
Assignees
Labels
Closed Accepted by CSSWG ResolutionCommenter SatisfiedCommenter has indicated satisfaction with the resolution / edits.Commenter has indicated satisfaction with the resolution / edits.Needs Testcase (WPT)i18n-trackerGroup bringing to attention of Internationalization, or tracked by i18n but not needing response.Group bringing to attention of Internationalization, or tracked by i18n but not needing response.selectors-4Current WorkCurrent Workspec-test-mismatchIssues about tests that seem to assume things that contradict the spec or are not specified at allIssues about tests that seem to assume things that contradict the spec or are not specified at all
Type
Fields
Give feedbackNo fields configured for issues without a type.
According to https://www.w3.org/TR/selectors-4/#the-lang-pseudo,
The text also goes on to mention that
[my emphasis] which implies, as I understand it, that something like
:lang("qq")will match content tagged withlang="qq"even thoughqqis not a valid language tag (as listed in the IANA registry).However, the Selectors spec does not specifically address how ill-formed (not merely invalid) tags should be handled.
According to the language tag syntax given in https://www.rfc-editor.org/rfc/rfc5646#section-2.1, a tag like
åå(containing non-ASCII characters) would be ill-formed ("the language tags described in this document are sequences of characters from the US-ASCII [ISO646] repertoire"), as would a tag likeen---(the various subtags following the primary language subtag are optional, but the grammar does not allow for them to be empty; if they're not present, the corresponding hyphen delimiters should also be omitted).So how does
:lang()matching work in the presence of ill-formed codes? It seems to me that a literal reading of the spec requires that such codes never match, because its definition of "matches" depends on "when represented in BCP 47 syntax", and such ill-formed codes cannot be represented in BCP 47 at all; they conflict with its basic grammar.A possible alternative interpretation might be that the handling of ill-formed codes is simply undefined (because the spec only addresses what it means to "match" for codes "represented in BCP 47 syntax".
I'm not aware of any compelling use case for ill-formed language codes. So in the interests of clarity and interoperability I would like to ask the WG to confirm (and explicitly note in the spec) that
:lang()matching is based strictly on BCP 47 and RFC4647, and as such, ill-formed codes never match.(Note that the current implementation in WebKit does allow ill-formed tags to match. Thus if content is tagged with
lang="SomeRandomCode-Latn-US", which is ill-formed because the primary language subtag is too long, it is nevertheless matched by:lang(SomeRandomCode),:lang("*-US"), etc. I think this should be considered a bug in the implementation.)