|
1 | 1 | rust-cssparser
|
2 | 2 | ==============
|
3 | 3 |
|
4 |
| -Rust implementation of the 2013 version of |
5 |
| -[css3-syntax](http://dev.w3.org/csswg/css3-syntax/) |
| 4 | +Rust implementation of |
| 5 | +[CSS Syntax Module Level 3](http://www.w3.org/TR/css-syntax-3/) |
| 6 | + |
| 7 | + |
| 8 | +Overview |
| 9 | +-------- |
| 10 | + |
| 11 | +Parsing CSS involves a series of steps: |
| 12 | + |
| 13 | +* When parsing from bytes, |
| 14 | + (e.g. reading a file or fetching an URL from the network,) |
| 15 | + detect the character encoding |
| 16 | + (based on a `Content-Type` HTTP header, an `@charset` rule, a BOM, etc.) |
| 17 | + and decode to Unicode text. |
| 18 | + |
| 19 | + rust-cssparser does not do this yet and just assumes UTF-8. |
| 20 | + |
| 21 | + This step is skipped when parsing from Unicode, e.g. in an HTML `<style>` element. |
| 22 | + |
| 23 | +* Tokenization, a.k.a. lexing. |
| 24 | + The input, a stream of Unicode text, is transformed into a stream of *tokens*. |
| 25 | + Tokenization never fails, although the output may contain *error tokens*. |
| 26 | + |
| 27 | +* This flat stream of tokens is then transformed into a tree of *component values*, |
| 28 | + which are either *preserved tokens*, |
| 29 | + or blocks/functions (`{ … }`, `[ … ]`, `( … )`, `foo( … )`) |
| 30 | + that contain more component values. |
| 31 | + |
| 32 | + rust-cssparser does this at the same time as tokenization: |
| 33 | + raw tokens are never materialized, you only get component values. |
| 34 | + |
| 35 | +* Component values can then be parsed into generic rules or declarations. |
| 36 | + The header and body of rules as well as the value of declarations |
| 37 | + are still just lists of component values at this point. |
| 38 | + See [the `ast` module](ast.rs) for the data structures. |
| 39 | + |
| 40 | +* The last step of a full CSS parser is |
| 41 | + parsing the remaining component values |
| 42 | + into [Selectors](http://www.w3.org/TR/selectors/), |
| 43 | + specific CSS properties, etc. |
| 44 | + |
| 45 | + By design, rust-cssparser does not do this last step |
| 46 | + which depends a lot on what you want to do: |
| 47 | + which properties you want to support, what you want to do with selectors, etc. |
| 48 | + |
| 49 | + It does however provide some helper functions to parse [CSS colors](color.rs) |
| 50 | + and [An+B](nth.rs) (the argument to `:nth-child()` and related selectors. |
| 51 | + |
| 52 | + See [Servo’s `style` module](https://github.com/mozilla/servo/tree/master/src/components/script/style) |
| 53 | + for an example of a parser based on rust-cssparser. |
6 | 54 |
|
7 | 55 |
|
8 | 56 | TODO
|
9 | 57 | ----
|
10 | 58 |
|
11 |
| -* Detect character encoding & decode from bytes |
12 |
| -* Figure out float and integer overflow |
| 59 | +* Detect character encoding & decode from bytes, |
| 60 | + using [rust-encoding](https://github.com/lifthrasiir/rust-encoding). |
| 61 | +* Figure out float and integer overflow in parsing. (Clamp instead?) |
13 | 62 | * Serialize tokens back to CSS
|
14 |
| -* Make it fast! |
| 63 | +* Make it fast! (Add a fast path in identifier tokenization?) |
0 commit comments