Skip to content

Add parsing from bytes, with rust-encoding. #33

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 13, 2013
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions css-parsing-tests/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,24 @@ associated with the expected result.
The Unicode input is represented by a JSON string,
the output as a list of `qualified rules`_ or at-rules_.

``stylesheet_bytes.json``
Tests `Parse a stylesheet
<http://dev.w3.org/csswg/css-syntax-3/#parse-a-stylesheet>`_
together with `The input byte stream
<http://dev.w3.org/csswg/css-syntax/#input-byte-stream>`_.
The input is represented as a JSON object containing:

* A required ``css_bytes``, the input byte string,
represented as a JSON string where code points U+0000 to U+00FF
represent bytes of the same value.
* An optional ``protocol_encoding``,
a protocol encoding label as a JSON string, or null.
* An optional ``environment_encoding``,
an environment encoding label as a JSON string, or null.
* An optional ``comment`` that is ignored.

The output is represented a list of `qualified rules`_ or at-rules_.

``color3.json``
Tests the ``<color>`` syntax `defined in CSS Color Level 3
<http://www.w3.org/TR/css3-color/#colorunits>`_.
Expand Down
126 changes: 126 additions & 0 deletions css-parsing-tests/stylesheet_bytes.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
[

{"css_bytes": ""},
[[], "utf-8"],

{"css_bytes": "@\u00C3\u00A9",
"protocol_encoding": null, "environment_encoding": null},
[[["at-rule", "é", [], null]], "utf-8"],

{"css_bytes": "@\u00C3\u00A9"},
[[["at-rule", "é", [], null]], "utf-8"],

{"css_bytes": "@\u0000\u00E9\u0000",
"comment": "Untagged UTF-16, parsed as UTF-8"},
[[["at-rule", "���", [], null]], "utf-8"],

{"css_bytes": "\u00FF\u00FE@\u0000\u00E9\u0000",
"comment": "UTF-16 with a BOM"},
[[["at-rule", "é", [], null]], "utf-16le"],

{"css_bytes": "\u00FE\u00FF\u0000@\u0000\u00E9"},
[[["at-rule", "é", [], null]], "utf-16be"],

{"css_bytes": "@\u00E9"},
[[["at-rule", "�", [], null]], "utf-8"],


{"css_bytes": "@\u00E9", "protocol_encoding": "ISO-8859-2"},
[[["at-rule", "é", [], null]], "iso-8859-2"],

{"css_bytes": "@\u00E9", "protocol_encoding": "ISO-8859-5"},
[[["at-rule", "щ", [], null]], "iso-8859-5"],

{"css_bytes": "@\u00C3\u00A9", "protocol_encoding": "ISO-8859-2"},
[[["at-rule", "ĂŠ", [], null]], "iso-8859-2"],

{"css_bytes": "\u00EF\u00BB\u00BF @\u00C3\u00A9",
"protocol_encoding": "ISO-8859-2",
"comment": "BOM takes precedence over protocol"},
[[["at-rule", "é", [], null]], "utf-8"],


{"css_bytes": "@charset \"ISO-8859-5\"; @\u00E9"},
[[["at-rule", "charset", [" ", ["string", "ISO-8859-5"]], null],
["at-rule", "щ", [], null]],
"iso-8859-5"],

{"css_bytes": "@Charset \"ISO-8859-5\"; @\u00E9",
"comment": "@charset has to match an exact byte pattern"},
[[["at-rule", "Charset", [" ", ["string", "ISO-8859-5"]], null],
["at-rule", "�", [], null]],
"utf-8"],

{"css_bytes": "@charset \"ISO-8859-5\"; @\u00E9",
"comment": "@charset has to match an exact byte pattern"},
[[["at-rule", "charset", [" ", ["string", "ISO-8859-5"]], null],
["at-rule", "�", [], null]],
"utf-8"],

{"css_bytes": "@charset 'ISO-8859-5'; @\u00E9",
"comment": "@charset has to match an exact byte pattern"},
[[["at-rule", "charset", [" ", ["string", "ISO-8859-5"]], null],
["at-rule", "�", [], null]],
"utf-8"],


{"css_bytes": "@\u0000c\u0000h\u0000a\u0000r\u0000s\u0000e\u0000t\u0000 \u0000\"\u0000U\u0000T\u0000F\u0000-\u00001\u00006\u0000L\u0000E\u0000\"\u0000;\u0000@\u0000\u00e9\u0000",
"comment": "@charset has to be ASCII-compatible itself"},
[[["at-rule", "�c�h�a�r�s�e�t�",
[" ", ["ident", "�"], ["string", "�U�T�F�-�1�6�L�E�"], ["ident", "�"]], null],
["error", "invalid"]],
"utf-8"],

{"css_bytes": "@charset \"UTF-16LE\"; @\u00C3\u00A9",
"comment": "@charset can only specify ASCII-compatible encodings"},
[[["at-rule", "charset", [" ", ["string", "UTF-16LE"]], null],
["at-rule", "é", [], null]],
"utf-8"],


{"css_bytes": "\u00EF\u00BB\u00BF @charset \"ISO-8859-5\"; @\u00E9",
"comment": "BOM takes precedence over @charset"},
[[["at-rule", "charset", [" ", ["string", "ISO-8859-5"]], null],
["at-rule", "�", [], null]],
"utf-8"],

{"css_bytes": "\u00EF\u00BB\u00BF @charset \"ISO-8859-5\"; @\u00C3\u00A9",
"comment": "BOM takes precedence over @charset"},
[[["at-rule", "charset", [" ", ["string", "ISO-8859-5"]], null],
["at-rule", "é", [], null]],
"utf-8"],

{"css_bytes": "@charset \"ISO-8859-5\"; @\u00E9",
"protocol_encoding": " Iso-8859-2",
"comment": "Protocol takes precedence over @charset"},
[[["at-rule", "charset", [" ", ["string", "ISO-8859-5"]], null],
["at-rule", "é", [], null]],
"iso-8859-2"],


{"css_bytes": "@\u00E9", "environment_encoding": "ISO-8859-2"},
[[["at-rule", "é", [], null]], "iso-8859-2"],

{"css_bytes": "@\u00E9", "environment_encoding": "ISO-8859-5"},
[[["at-rule", "щ", [], null]], "iso-8859-5"],

{"css_bytes": "@charset \"ISO-8859-5\"; @\u00E9",
"environment_encoding": "ISO-8859-2",
"comment": "@character takes precedence over environment"},
[[["at-rule", "charset", [" ", ["string", "ISO-8859-5"]], null],
["at-rule", "щ", [], null]],
"iso-8859-5"],

{"css_bytes": "@\u00E9",
"protocol_encoding": "ISO-8859-2",
"environment_encoding": "ISO-8859-5",
"comment": "protocol takes precedence over environment"},
[[["at-rule", "é", [], null]], "iso-8859-2"],

{"css_bytes": "\u00EF\u00BB\u00BF @\u00C3\u00A9",
"environment_encoding": "ISO-8859-5",
"comment": "BOM takes precedence over environment"},
[[["at-rule", "é", [], null]], "utf-8"]


]