Skip to content

[css-syntax] The tokenizer input should probably be a stream of scalar values, not codepoints #3307

@tabatkins

Description

@tabatkins

In WICG/construct-stylesheets#61 (comment), Boris points out that the Syntax spec specifies that the input to the tokenizer is a stream of code points, and wonders if I actually mean scalar values there. (That is, all codepoints except surrogates.)

I think at the time I wrote this, USVString didn't yet exist, and the distinction between the two wasn't really present in specs. But if I were writing it today, I'm pretty sure I'd use scalar values.

In particular, note that you can't produce a surrogate code point from an escape, which suggests that I assumed no surrogates would show up in the stream.

So, I think I should switch the spec over to referring to scalar values, and have a conversion step for going from codepoints to scalars (probably just converting non-scalars to U+FFFD? I'll look at impls and see what's up).

Test case:

<!DOCTYPE html>
<style></style>
<script>
document.querySelector("style").textContent = ".fo\ud800o { color: blue; }";
w([...document.styleSheets[0].cssRules[0].selectorText].map(x=>x.codePointAt(0).toString(16)));
</script>

Looks like Chrome retains the character as U+D800, while Firefox censors it to U+FFFD. Perhaps this is just related to which definition each uses for CSSOMString?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions