Skip to content

[css-font] Clarify what value is invalid for font-language-override and why it shouldn't generate parse error #1104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
upsuper opened this issue Mar 16, 2017 · 13 comments

Comments

@upsuper
Copy link
Member

upsuper commented Mar 16, 2017

The spec currently says the <string> value of font-language-override is

single three-letter case-sensitive OpenType language system tag, specifies the OpenType language system to be used instead of the language system implied by the language of the element

but it also states that

Use of invalid OpenType language system tags must not generate a parse error but must be ignored when doing glyph selection and placement.

It isn't clear to me what does it mean by "invalid OpenType language system tags", and why an invalid tag "must not generate a parse error".

In my understanding, a tag which is longer or shorter than three-letter, or includes any character outside ASCII letters range, is invalid. And if it is that case, I think generating a parse error for that is a more future-proof way, since if at some point, people want to introduce new syntax for language tag, authors would be able to use the new syntax in a backward-compatible way.

If that "invalid" refers to tags which conform to the format above but are not listed in the table linked, I agree that they should not generate a parse error, since OpenType may add new tags in the future. But I think these two cases should be handled separately anyway.

@dscorbett
Copy link

Some OpenType language system tags are four letters long (IPPH). Some include digits (HYE0). In principle, a tag can include any character between U+0020 and U+007E.

@upsuper
Copy link
Member Author

upsuper commented Mar 19, 2017

I guess that means the spec needs some change anyway.

cc @nattokirai @litherum

@svgeesus
Copy link
Contributor

svgeesus commented Dec 6, 2017

Given font-language-override is at-risk due to only one implementation, maybe move this to level 4? Still needs discussion and resolution though.

@litherum
Copy link
Contributor

litherum commented Dec 6, 2017

Yes to level 4, and yes to clarifying. My intuition is that it’s saying that the grammar shouldn’t include the set of current languages and should instead just look at the string length and character set. If that’s true, we should certainly clarify it.

@dbaron
Copy link
Member

dbaron commented Jan 10, 2018

It's also somewhat unusual for CSS to have parsing errors that are internal to the contents of a string. It might make sense for all strings to be accepted, but ones that don't make sense to be rejected later.

@css-meeting-bot
Copy link
Member

The Working Group just discussed [css-font] Clarify what value is invalid for font-language-override and why it shouldn't generate parse error, and agreed to the following resolutions:

  • RESOLVED: Move the issue in https://github.com/w3c/csswg-drafts/issues/1104 to Fonts L4
The full IRC log of that discussion <dael> Topic: [css-font] Clarify what value is invalid for font-language-override and why it shouldn't generate parse error
<dael> github: https://github.com//issues/1104
<dael> Chris: It is odd to have a parsing error based on string contents.
<dael> Chris: b/c font lang override is at risk anyway we should move this to L4. I'm fine with that.
<dael> astearns: I'm fine with moving to L4. myles is as well it looks.
<dael> astearns: Obj to moving this issue to Fonts L4?
<dael> RESOLVED: Move the issue in https://github.com//issues/1104 to Fonts L4

@svgeesus
Copy link
Contributor

From the OpenType spec:

All tags are 4-byte character strings composed of a limited set of ASCII characters in the range 0x20 to 0x7E. Spaces (0x20) may only occur as a trailing sequence within the tag. As a general convention, capital letters (0x41 to 0x5A) are used. If a language system tag consists of three or less visible letters, the letters are followed by the requisite number of spaces each consisting of a single byte, to complete a 4-byte tag.
https://www.microsoft.com/typography/otspec/languagetags.htm

That seems abundantly clear, and we already link to that definition.

@svgeesus
Copy link
Contributor

@upsuper I moved the font-language-override section to Fonts 4, and then changed this

Use of invalid OpenType language system tags must not generate a
parse error but must be ignored when doing glyph selection and
placement.

to

Unknown OpenType language system tags are silently ignored, and do not affect
glyph selection and placement.

Can you confirm that this solves your issue?

@dscorbett
Copy link

Does that mean the user agent should discard an unknown tag, so it doesn’t affect glyph selection or placement; or that it should pass the unknown tag through to the renderer, where it might have an effect but probably won’t?

@litherum
Copy link
Contributor

User-agents should pass it through. If the font supports some language that the browser has never heard of, there's no reason it shouldn't work correctly.

@svgeesus
Copy link
Contributor

@litherum agreed. Could that be made clearer in my new text?

@dscorbett
Copy link

The current text still says “single three-letter case-sensitive OpenType language system tag”. Actually, it’s a string of four ASCII printable characters, where no space precedes a non-space. CSS should allow any such string for font-language-override. It should also accept shorter strings and pad them with spaces, for convenience.

@svgeesus
Copy link
Contributor

svgeesus commented Nov 3, 2020

It now says:

single four-character case-sensitive OpenType language system tag, specifies the OpenType language system to be used instead of the language system implied by the language of the element.

If the string is shorter than four characters, it is padded at the end with space (U+0020) characters such that the length is 4, before being matched.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants