Skip to content

[css-text] What are the language-defined segment breaks for HTML? #5147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
SimonSapin opened this issue Jun 2, 2020 · 5 comments
Closed
Labels
Closed Accepted as Editorial Commenter Satisfied Commenter has indicated satisfaction with the resolution / edits. css-text-3 Current Work Testing Unnecessary Memory aid - issue doesn't require tests Tracked in DoC

Comments

@SimonSapin
Copy link
Contributor

https://drafts.csswg.org/css-text/#white-space

Except where specified otherwise, White space processing in CSS affects only the document white space characters: spaces (U+0020), tabs (U+0009), and segment breaks.

https://drafts.csswg.org/css-text/#segment-break

CSS does not define document segmentation rules. Segments can be separated by a particular newline sequence (such as a line feed or CRLF pair), or delimited by some other mechanism, such as the SGML RECORD-START and RECORD-END tokens. For CSS processing, each document language–defined segment break and each line feed (U+000A) in the text is treated as a segment break, which is then interpreted for rendering as specified by the white-space property.

I understand that CSS wants to build abstractions in order to potentially support any document language, but making it the responsibility of those languages to specifically hook into those abstractions is not great as they might… not.

CTRL+F for "segment break" in the single-page version of https://html.spec.whatwg.org/ does not find anything. Does this mean that HTML does not have any language-specific segment break?

If css-text is not the right place to define this normatively it’d be helpful to point this out in a note.

@fantasai
Copy link
Collaborator

fantasai commented Jun 2, 2020

I think the key to that question is to look at the definition of segment break, rather than for usage of the specific term. HTML clearly treats CR, LF, and CRLF in the source as segment breaks.

But it converts them to LF before passing to the DOM, which is then passed to CSS for formatting, and I'm not entirely clear what DOM thinks. I think it ends up defaulting to just having LF as a segment break.

@SimonSapin
Copy link
Contributor Author

https://html.spec.whatwg.org/multipage/parsing.html#preprocessing-the-input-stream says that the input of the HTML parser must go through https://infra.spec.whatwg.org/#normalize-newlines which replaces CRLF and lone CR with LF. But the way this is written is all about moving code points around, it doesn’t attribute them much meaning.

https://html.spec.whatwg.org/#newlines defines a "newline" term

Newlines in HTML may be represented either as U+000D CARRIAGE RETURN (CR) characters, U+000A LINE FEED (LF) characters, or pairs of U+000D CARRIAGE RETURN (CR), U+000A LINE FEED (LF) characters in that order.

Presumably "in HTML" there means before preprocessing.

I think it ends up defaulting

Having normative spec based on defaulting, on the absence of a definition that says to do otherwise, is what I think is not great. As an implementer I don’t feel confident that there is indeed no such definition for HTML, rather than I failed to find it.

@fantasai
Copy link
Collaborator

fantasai commented Dec 3, 2020

@SimonSapin OK, we made some rewrites in that section that hopefully should make this interaction more clear. https://drafts.csswg.org/css-text-3/#white-space-processing Let us know if it addresses your issue!

@SimonSapin
Copy link
Contributor Author

The new text starting with "In the case of HTML" in 1014e0a looks good, but curiously I don’t find it in https://drafts.csswg.org/css-text-3/. This commit is also not listed in https://drafts.csswg.org/history/css-text-3/

@fantasai
Copy link
Collaborator

fantasai commented Dec 9, 2020

Seems to have made it to the server at this point, so closing out. :)

@fantasai fantasai closed this as completed Dec 9, 2020
@fantasai fantasai added Commenter Satisfied Commenter has indicated satisfaction with the resolution / edits. Tracked in DoC and removed Commenter Response Pending labels Dec 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closed Accepted as Editorial Commenter Satisfied Commenter has indicated satisfaction with the resolution / edits. css-text-3 Current Work Testing Unnecessary Memory aid - issue doesn't require tests Tracked in DoC
Projects
None yet
Development

No branches or pull requests

3 participants