Skip to content

Conversation

@RobinMalfait
Copy link
Member

This PR ensures that the Rust based parser now extracts all the expected candidates that the RegEx based parser also extracts.

This will allow us to eventually switch to the Rust based parser by default.

There are a few caveats:

  1. If a custom transformer or extractor is required for a file, then the RegEx based parser will be used for that file.
  2. If a custom separator or prefix is used, then we fallback to the RegEx based parser.

thecrypticace and others added 28 commits June 7, 2023 15:56
The RegEx parser does extract `underline` from

```html
<div class="peer-aria-[labelledby='a_b']:underline"></div>
```
... but that's not needed and is not happening in the oxide parser

This means that we have to make the output check a little bit different
but they are explicit based on the feature flag.
This makes sure all the fancy SIMD stuff is as early as possible. This results in an extremely minor perf increase.
no meaningful perf difference in real world scenarios
It needs to be done in a different spot so it doesn’t affect how things are returned
@RobinMalfait RobinMalfait merged commit 55daf8e into master Jun 7, 2023
@RobinMalfait RobinMalfait deleted the test-both-parsers branch June 7, 2023 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants