-
Notifications
You must be signed in to change notification settings - Fork 108
Parse error with Unicode supplementary characters in “style” element in HTML document #383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I haven’t tested this yet, but I suspect that for checking by URL, the following patch would cause the same parse error. (Scroll to see the full patch). diff --git a/org/w3c/css/parser/CssFouffa.java b/org/w3c/css/parser/CssFouffa.java
index ef3580bf..a195bb7b 100644
--- a/org/w3c/css/parser/CssFouffa.java
+++ b/org/w3c/css/parser/CssFouffa.java
@@ -26,12 +26,14 @@ import org.w3c.css.util.ApplContext;
import org.w3c.css.util.CssVersion;
import org.w3c.css.util.HTTPURL;
import org.w3c.css.util.InvalidParamException;
+import org.w3c.css.util.UnescapeFilterReader;
import org.w3c.css.util.Util;
import org.w3c.css.util.WarningParamException;
import org.w3c.css.util.Warnings;
import org.w3c.css.values.CssExpression;
import org.w3c.css.values.CssValue;
+import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
@@ -88,7 +90,7 @@ public final class CssFouffa extends CssParser {
*/
public CssFouffa(ApplContext ac, Reader reader, URL file, int beginLine)
throws IOException {
- super(reader);
+ super(new UnescapeFilterReader(new BufferedReader(reader)));
if (ac.getOrigin() == -1) {
setOrigin(StyleSheetOrigin.AUTHOR); // default is user
} else { |
sideshowbarker
added a commit
that referenced
this issue
Oct 31, 2022
Fixes #383 When performing preprocessing of the input stream as specified in https://drafts.csswg.org/css-syntax/#input-preprocessing, this change makes our implementation handle non-BMP supplementary characters as expected — by only replacing surrogates with U+FFFD if they are lone surrogates, but not replacing surrogates that are part of surrogate pairs (a high surrogate followed by a low surrogate). Otherwise, without this change, a parse error will occur when our implementation encounters supplementary characters in the input stream.
sideshowbarker
added a commit
that referenced
this issue
Oct 31, 2022
Fixes #383 When performing preprocessing of the input stream as specified in https://drafts.csswg.org/css-syntax/#input-preprocessing, this change makes our implementation handle non-BMP supplementary characters as expected — by only replacing surrogates with U+FFFD if they are lone surrogates, but not replacing surrogates that are part of surrogate pairs (a high surrogate followed by a low surrogate). Otherwise, without this change, a parse error will occur when our implementation encounters supplementary characters in the input stream.
sideshowbarker
added a commit
that referenced
this issue
Oct 31, 2022
Fixes #383 When performing preprocessing of the input stream as specified in https://drafts.csswg.org/css-syntax/#input-preprocessing, this change makes our implementation handle non-BMP supplementary characters as expected — by only replacing surrogates with U+FFFD if they are lone surrogates, but not replacing surrogates that are part of surrogate pairs (a high surrogate followed by a low surrogate). Otherwise, without this change, a parse error will occur when our implementation encounters supplementary characters in the input stream.
sideshowbarker
added a commit
that referenced
this issue
Oct 31, 2022
Fixes #383 When performing preprocessing of the input stream as specified in https://drafts.csswg.org/css-syntax/#input-preprocessing, this change makes our implementation handle non-BMP supplementary characters as expected — by only replacing surrogates with U+FFFD if they are lone surrogates, but not replacing surrogates that are part of surrogate pairs (a high surrogate followed by a low surrogate). Otherwise, without this change, a parse error will occur when our implementation encounters supplementary characters in the input stream.
sideshowbarker
added a commit
that referenced
this issue
Oct 31, 2022
Fixes #383 When performing preprocessing of the input stream as specified in https://drafts.csswg.org/css-syntax/#input-preprocessing, this change makes our implementation handle non-BMP supplementary characters as expected — by only replacing surrogates with U+FFFD if they are lone surrogates, but not replacing surrogates that are part of surrogate pairs (a high surrogate followed by a low surrogate). Otherwise, without this change, a parse error will occur when our implementation encounters supplementary characters in the input stream.
sideshowbarker
added a commit
that referenced
this issue
Oct 31, 2022
Fixes #383 When performing preprocessing of the input stream as specified in https://drafts.csswg.org/css-syntax/#input-preprocessing, this change makes our implementation handle non-BMP supplementary characters as expected — by only replacing surrogates with U+FFFD if they are lone surrogates, but not replacing surrogates that are part of surrogate pairs (a high surrogate followed by a low surrogate). Otherwise, without this change, a parse error will occur when our implementation encounters supplementary characters in the input stream.
sideshowbarker
added a commit
that referenced
this issue
Oct 31, 2022
Fixes #383 When performing preprocessing of the input stream as specified in https://drafts.csswg.org/css-syntax/#input-preprocessing, this change makes our implementation handle non-BMP supplementary characters as expected — by only replacing surrogates with U+FFFD if they are lone surrogates, but not replacing surrogates that are part of surrogate pairs (a high surrogate followed by a low surrogate). Otherwise, without this change, a parse error will occur when our implementation encounters supplementary characters in the input stream.
sideshowbarker
added a commit
that referenced
this issue
Nov 1, 2022
Fixes #383 When performing preprocessing of the input stream as specified in https://drafts.csswg.org/css-syntax/#input-preprocessing, this change makes our implementation handle non-BMP supplementary characters as expected — by only replacing surrogates with U+FFFD if they are lone surrogates, but not replacing surrogates that are part of surrogate pairs (a high surrogate followed by a low surrogate). Otherwise, without this change, a parse error will occur when our implementation encounters supplementary characters in the input stream.
sideshowbarker
added a commit
that referenced
this issue
Nov 1, 2022
Fixes #383 When performing preprocessing of the input stream as specified in https://drafts.csswg.org/css-syntax/#input-preprocessing, this change makes our implementation handle non-BMP supplementary characters as expected — by only replacing surrogates with U+FFFD if they are lone surrogates, but not replacing surrogates that are part of surrogate pairs (a high surrogate followed by a low surrogate). Otherwise, without this change, a parse error will occur when our implementation encounters supplementary characters in the input stream.
sideshowbarker
added a commit
that referenced
this issue
Nov 1, 2022
Fixes #383 When performing preprocessing of the input stream as specified in https://drafts.csswg.org/css-syntax/#input-preprocessing, this change makes our implementation handle non-BMP supplementary characters as expected — by only replacing surrogates with U+FFFD if they are lone (unpaired) surrogates, but not replacing surrogates that are part of surrogate pairs (a high surrogate followed by a low surrogate). Otherwise, without this change, a parse error will occur when our implementation encounters supplementary characters in the input stream.
sideshowbarker
added a commit
that referenced
this issue
Nov 1, 2022
Fixes #383 When performing preprocessing of the input stream as specified in https://drafts.csswg.org/css-syntax/#input-preprocessing, this change makes our implementation handle non-BMP supplementary characters as expected — by only replacing surrogates with U+FFFD if they are lone (unpaired) surrogates, but not replacing surrogates that are part of surrogate pairs (a high surrogate followed by a low surrogate). Otherwise, without this change, a parse error will occur when our implementation encounters supplementary characters in the input stream.
sideshowbarker
added a commit
that referenced
this issue
Nov 2, 2022
Fixes #383 This change drops the code for replacing surrogate code points from our implementation of “filter code points” from “Preprocessing the input stream” at https://drafts.csswg.org/css-syntax/#css-filter-code-points w3c/csswg-drafts#3307 (comment) notes that the only way to produce a surrogate code point in CSS content is by directly assigning a DOMString with one in it via an OM operation; in other words, by manipulating a document using JavaScript to insert a surrogate code point into the document. But because the CSS validator doesn’t execute any JavaScript from a document, there’s no way for a document being checked by the CSS validator to contain any surrogate code points. Therefore, it’s unnecessary for our implementation to handle replacement of surrogate code points. In other words, our implementation can still conform to the spec requirements even if we don’t perform surrogate replacement.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
See https://jigsaw.w3.org/css-validator/validator?uri=https://sideshowbarker.net/tests/css-supplementary-code-point.html
The source of https://sideshowbarker.net/tests/css-supplementary-code-point.html has this:
In both cases — in the
style
elements both with and without@charset "UTF-8"
— the 🚧 (U+1F6A7 CONSTRUCTION SIGN) character causes the CSS validator to report a parse error.The parse error does not occur if the following patch is applied (
patch --ignore-whitespace -p1 < patch
) to the sources:So the cause would seem to be in https://github.com/w3c/css-validator/blob/main/org/w3c/css/util/UnescapeFilterReader.java, which appears to only be called on content that comes in from a
style
element in an HTML document (as opposed to being from a separate standalone stylesheet resource, or being entered from the validator’s direct-input textarea).Related issue: validator/validator#1344
The text was updated successfully, but these errors were encountered: