Skip to content

[selectors] What a whitespace character is #3754

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
r12a opened this issue Mar 21, 2019 · 7 comments
Open

[selectors] What a whitespace character is #3754

r12a opened this issue Mar 21, 2019 · 7 comments
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. selectors-4 Current Work

Comments

@r12a
Copy link
Contributor

r12a commented Mar 21, 2019

https://www.w3.org/Mail/flatten/index?subject=what+a+whitespace++character+is&list=www-style

The i18n WG was reviewing issues that it is tracking and came across the discussion thread in the link above.

The ends of the discussion don't all appear to be tidied up. Could someone (maybe @fantasai?) please summarise what progress was made on the issues raised and the current status.

Thanks.

@r12a r12a added the i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. label Mar 21, 2019
@xfq xfq added the css-text-3 Current Work label Mar 22, 2019
@frivoal frivoal added selectors-4 Current Work and removed css-text-3 Current Work labels Apr 23, 2019
@frivoal
Copy link
Collaborator

frivoal commented Apr 23, 2019

This wasn't an issue about about what white space means in css-text (which is well defined), but one about what it meant in the :blank selectors. The answer may have needed to refer to css-text, but that doesn't make it an issue about css-text itself, so changing tags.

@frivoal
Copy link
Collaborator

frivoal commented Apr 23, 2019

Also, the latest version of the spec on the :blank selector no longer refers to white spaces, and says this instead:

The :blank pseudo-class applies to user-input elements whose input value is empty (consists of the empty string or otherwise null input).

I think we can close this issue.

@frivoal
Copy link
Collaborator

frivoal commented Apr 23, 2019

Agenda+ to confirm the above. Chairs / editors, feel free to close without a conf call discussion if you think this is sufficiently obvious.

@css-meeting-bot
Copy link
Member

The CSS Working Group just discussed What a whitespace character is, and agreed to the following:

  • RESOLVED: Close theissue
The full IRC log of that discussion <dael> Topic: What a whitespace character is
<dael> github: https://github.com//issues/3754
<dael> florian: Filed against text, but it's a selectors issue. But I'm not editor. I think we can close though. Old definition of common blank is it conains only whitespace. We no longer rely on whitespace so I think this is a non-issue.
<dael> TabAtkins: Yeah, close
<dael> Rossen_: Other opinions?
<dael> Rossen_: Objections to close the issue
<dael> RESOLVED: Close theissue

@frivoal frivoal closed this as completed May 2, 2019
@frivoal frivoal changed the title [css-text] What a whitespace character is [selectors] What a whitespace character is May 2, 2019
@fantasai fantasai reopened this May 16, 2019
@fantasai
Copy link
Collaborator

fantasai commented May 16, 2019

The issue was about aligning the definitions of white space across CSS, HTML, and Selectors. These definitions are not currently aligned; there are two:

  • Infra spec's “ASCII white space”, which is used for syntactic white space - currently U+0009 TAB, U+000A LF, U+000C FF, U+000D CR, or U+0020 SPACE
  • CSS Text’s “document white space”, which is used for formatting, currently spaces (U+0020), tabs (U+0009), and segment breaks (which are defined roughly equivalent to "line feeds and whatever else the host language thinks is a newline").

There was a point during which these two sets of characters were aligned. However, because HTML parsing converts various segment break sequences to line feeds in the DOM, it was decided that carriage returns in CSS should be treated like miscellaneous formatting control characters, not as white space. Likewise form feeds. See #1990 and #855

Currently the definition of :empty references “document white space”. Form feeds do not currently qualify as “document white space”, and neither do carriage returns. They are to be rendered as visible control characters. Thus the original premise of that thread, that the various definitions of white space should be aligned, is not satisfied.

@fantasai
Copy link
Collaborator

CC @dbaron , who raised the issue.

@frivoal
Copy link
Collaborator

frivoal commented May 17, 2019

  • Currently the definition of :empty references “document white space”. Form feeds do not currently qualify as “document white space”, and neither do carriage returns. They are to be rendered as visible control characters.

    That seems like a feature to me: if an element contains things that are visible control characters not, then it seems good that it would not be considered :empty. regardless of whether we include 0x0C or 0x0D in "document white space, the fact that css-text and :empty are aligned seems good.

  • ::first-letter isn't defined in terms of white space at all, it is defined in terms of "the first typographic letter unit", plus any preceeding "characters that belong to the Punctuation (P*) Unicode general category". This implies that it will skip over more than just white space (regardless of definition), but also control characters, symbols and what not. Maybe we don't have enough tests, or maybe we don't have full interop, but we do seem to have a precise and sensible definition.

  • what the css parser (or the JS parser, for that matter) considers white space seems mostly irrelevant to anything else. It might align with other definitions, and it would be convenient for learnability if it did, but I don't care strongly, and compat probably sets in stone whatever we arrived at. (Note: for the css-parser, it is equivalent to the infra spec's ASCII white space with newline normalization; for JS it is its own beast)

  • How the HTML parser handles white space characters and sets up the DOM is well defined. It starts off with the infra spec's notion of ASCII white space and how to normalize new lines, but it is also somewhat context sensitive. It's also unlikely to change, due to compat.

So, from the point of view of having definitions and using them sensibly, I think we're good.

If we want to reduce that number of definitions, What we might want to do is to reopen #855 and to stop treating CR and FF as control characters, and start including them in document white-space instead, along with LF (and therefore make them invisible, collapse them, allow them in :empty...), which would allow us to align css-text's "document white space" with the infra spec's “ASCII white space”. I wouldn't have a strong objection to doing that, but I am also unconvinced it is useful.

On the other hand, the fact that this test/demo gives 3 different results in Chrome, Firefox and Safari is sad. Maybe we should look at how various kinds of line breaks are (or aren't) normalized when inserting content via the content property or via javascript.

@r12a r12a added i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. and removed i18n-needs-resolution Issue the Internationalization Group has raised and looks for a response on. labels May 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. selectors-4 Current Work
Projects
None yet
Development

No branches or pull requests

5 participants