css-parser-api/Overview.bs

<pre class='metadata'>
Title: CSS Parser API
Shortname: css-parser-api
Level: 1
Status: UD
Group: HOUDINI
URL: https://drafts.css-houdini.org/css-parser-api/
Editor: Tab Atkins-Bittner
Abstract: An API exposing the CSS parser more directly,
Abstract: for parsing arbitrary CSS-like languages into a mildly typed representation.
</pre>

Introduction {#intro}
=====================

Common data-interchange / parsing formats are very valuable
for reducing the learning curve of new languages,
as users get to lean on their existing knowledge of the format when authoring
and only have to newly learn the specifics of the language.
This is why generic parsing formats like XML or JSON have become so popular.

The CSS language could benefit from this same treatment;
a number of languages and tools rely on CSS-like syntax to express themselves,
but they usually rely on ad-hoc parsing
(often regex-based)
which can be relatively fragile,
and might break with CSS practices in interesting syntax corner cases.
Similarly, CSS syntax is increasingly used in places like attribute values
(such as the <{img/sizes}> attribute,
or most of the SVG presentation attributes),
and custom elements wanting to do the same thing
similarly have to rely on ad-hoc parsing right now.

To help with these sorts of cases,
this spec exposes the [[!css-syntax-3]] parsing algorithms,
and represents their results in a mildly-typed representation,
simpler and more abstract than what [[css-typed-om-1]] does for CSS properties.

Parsing API {#parsing-api}
==========================

<pre class=idl>
partial interface CSS {
	Promise&lt;sequence&lt;CSSParserRule>> parseStylesheet(DOMString css, optional CSSParserOptions options);
	Promise&lt;sequence&lt;CSSParserRule>> parseRuleList(DOMString css, optional CSSParserOptions options);
	Promise&lt;CSSParserRule> parseRule(DOMString css, optional CSSParserOptions options);
	Promsie&lt;sequence&lt;CSSParserRule>> parseDeclarationList(DOMString css, optional CSSParserOptions options);
	CSSParserDeclaration parseDeclaration(DOMString css, optional CSSParserOptions options);
	CSSParserValue parseValue(DOMString css);
	sequence&lt;CSSParserValue> parseValueList(DOMString css);
	sequence&lt;sequence&lt;CSSParserValue>> parseCommaValueList(DOMString css);
};

dictionary CSSParserOptions {
	object atRules;
	/* dict of at-rule name => at-rule type
	   (contains decls or contains qualified rules) */
};
</pre>

Issue: {{parseCommaValueList()}} is in Syntax, and thus here,
because it's actually a very common operation.
It's trivial to do yourself
(just call {{parseValueList()}} and then split into an array on top-level commas),
but comma-separated lists are so common
that it was worthwhile to improve spec ergonomics
by providing a shortcut for that functionality.
Is it worth it to provide this to JS as well?

Issue: Do we handle comments?
Currently I don't;
Syntax by default just drops comments,
but allows an impl to preserve information about them if they want.
Maybe add an option to preserve comments?
If so, they can appear *anywhere*,
in any API that returns a sequence.

Issue: What do we do if an unknown at-rule
(not appearing in the {{atRules}} option)
shows up in the results?
Default to decls or rules?
Or treat it more simply as just a token sequence?

Issue: Parsing stylesheets/rule lists should definitely be async,
because stylesheets can be quite large.
Parsing individual properties/value lists should definitely be sync,
because they're small and it would be really annoying.
Parsing a single rule, tho, is unclear--
is it large enough to be worth making async,
or is it too annoying to be worth it?

Parser Values {#parser-values}
==============================

<pre class=idl>
interface CSSParserRule {

};

[Constructor(DOMString name, sequence&lt;CSSParserValue> prelude, optional sequence&lt;CSSParserRule>? body)]
interface CSSParserAtRule : CSSParserRule {
	readonly attribute DOMString name;
	readonly attribute FrozenArray&lt;CSSParserValue> prelude;
	readonly attribute FrozenArray&lt;CSSParserRule>? body;
	/* nullable to handle at-statements */
};

[Constructor(sequence&lt;CSSParserValue> prelude, optional sequence&lt;CSSParserRule>? body)]
interface CSSParserQualifiedRule : CSSParserRule {
	readonly attribute FrozenArray&lt;CSSParserValue> prelude;
	readonly attribute FrozenArray&lt;CSSParserRule> body;
};

[Constructor(DOMString name, optional sequence&lt;CSSParserRule> body)]
interface CSSParserDeclaration : CSSParserRule {
	readonly attribute DOMString name;
	readonly attribute FrozenArray&lt;CSSParserValue> body;
};

interface CSSParserValue {
};

[Constructor(DOMString name, sequence&lt;CSSParserValue> body)]
interface CSSParserBlock : CSSParserValue {
	readonly attribute DOMString name; /* "[]", "{}", or "()" */
	readonly attribute FrozenArray&lt;CSSParserValue> body;
};

[Constructor(DOMString name, sequence&lt;sequence&lt;CSSParserValue>> args)]
interface CSSParserFunction : CSSParserValue {
	readonly attribute DOMString name;
	readonly attribute FrozenArray&lt;FrozenArray&lt;CSSParserValue>> args;
};

[Constructor(DOMString value)]
interface CSSParserIdent : CSSParserValue {
	readonly attribute DOMString value;
};

[Constructor(double value),
 Constructor(DOMString css)]
interface CSSParserNumber : CSSParserValue {
	readonly attribute double value;
};

[Constructor(double value),
 Constructor(DOMString css)]
interface CSSParserPercentage : CSSParserValue {
	readonly attribute double value;
};

[Constructor(double value, DOMString type),
 Constructor(DOMString css)]
interface CSSParserDimension : CSSParserValue {
	readonly attribute double value;
	readonly attribute DOMString type;
};

[Constructor(DOMString value)]
interface CSSParserAtKeyword : CSSParserValue {
	readonly attribute DOMString value;
};

[Constructor(DOMString value)]
interface CSSParserHash : CSSParserValue {
	readonly attribute DOMString value;
	/* expose an "is ident" boolean? */
};

[Constructor(DOMString value)]
interface CSSParserString : CSSParserValue {
	readonly attribute DOMString value;
};

[Constructor(DOMString value)]
interface CSSParserChar : CSSParserValue {
	readonly attribute DOMString value;
	/* for all delims, whitespace, and the
	   weird Selectors-based tokens
	   (split up into the individual chars) */
};
</pre>

Issue: Some of the CSSParserValue subtypes correspond closely to Typed OM types;
in particular, {{CSSParserIdent}} and {{CSSKeywordValue}} are basically the same thing,
as are {{CSSParserNumber}} and {{CSSNumberValue}}.
Is it worthwhile to transplant this hierarchy underneath the {{CSSStyleValue}} superclass,
and reuse the ones that are reasonable?

Issue: Trying to be as useful as possible,
without exposing so many details that we're unable to change tokenization in the future.
In particular, whitespace, delims, and the weird Selectors tokens
all get serialized as individual {{CSSParserChar}} "tokens",
which should allow us to change the set of Selectors tokens in the future safely.
Am I succeeding at this goal?