Skip to content

Commit b3c2556

Browse files
RobinMalfaitthecrypticacephilipp-spiess
authored
Improve Oxide candidate extractor [0] (#16306)
This PR adds a new candidate[^candidate] extractor with 2 major goals in mind: 1. It must be way easier to reason about and maintain. 2. It must have on-par performance or better than the current candidate extractor. ### Problem Candidate extraction is a bit of a wild west in Tailwind CSS and it's a very critical step to make sure that all your classes are picked up correctly to ensure that your website/app looks good. One issue we run into is that Tailwind CSS is used in many different "host" languages and frameworks with their own syntax. It's not only used in HTML but also in JSX/TSX, Vue, Svelte, Angular, Pug, Rust, PHP, Rails, Clojure, .NET, … the list goes on and all of these have different syntaxes. Introducing dedicated parsers for each of these languages would be a huge maintenance burden because there will be new languages and frameworks coming up all the time. The best thing we can do is make assumptions and so far we've done a pretty good job at that. The only certainty we have is that there is at least _some_ structure to the possible Tailwind classes used in a file. E.g.: `abc#def` is definitely not a valid class, `hover:flex` definitely is. In a perfect world we limit the characters that can be used and defined a formal grammar that each candidate must follow, but that's not really an option right now (maybe this is something we can implement in future major versions). The current candidate extractor we have has grown organically over time and required patching things here and there to make it work in various scenarios (and edge cases due to the different languages Tailwind is used in). While there is definitely some structure, we essentially work in 2 phases: 1. Try to extract `0..n` candidates. (This is the hard part) 2. Validate each candidate to make sure they are valid looking classes (by validating against the few rules we have) Another reason the current extractor is hard to reason about is that we need it to be fast and that comes with some trade-offs to readability and maintainability. Unfortunately there will always be a lot of false positives, but if we extract more classes than necessary then that's fine. It's only when we pass the candidates to the core engine that we will know for sure if they are valid or not. (we have some ideas to limit the amount of false positives but that's for another time) ### Solution Since the introduction of Tailwind CSS v4, we re-worked the internals quite a bit and we have a dedicated internal AST structure for candidates. For example, if you take a look at this: ```html <div class="[@media(pointer:fine)]:data-[state=pending]:hover:text-red-500/(--my-opacity)"></div> ``` <details> <summary>This will be parsed into the following AST:</summary> ```json [ { "kind": "functional", "root": "text", "value": { "kind": "named", "value": "red-500", "fraction": null }, "modifier": { "kind": "arbitrary", "value": "var(--my-opacity)" }, "variants": [ { "kind": "static", "root": "hover" }, { "kind": "functional", "root": "data", "value": { "kind": "arbitrary", "value": "state=pending" }, "modifier": null }, { "kind": "arbitrary", "selector": "@media(pointer:fine)", "relative": false } ], "important": false, "raw": "[@media(pointer:fine)]:data-[state=pending]:hover:text-red-500/(--my-opacity)" } ] ``` </details> We have a lot of information here and we gave these patterns a name internally. You'll see names like `functional`, `static`, `arbitrary`, `modifier`, `variant`, `compound`, ... Some of these patterns will be important for the new candidate extractor as well: | Name | Example | Description | | -------------------------- | ----------------- | --------------------------------------------------------------------------------------------------- | | Static utility (named) | `flex` | A simple utility with no inputs whatsoever | | Functional utility (named) | `bg-red-500` | A utility `bg` with an input that is named `red-500` | | Arbitrary value | `bg-[#0088cc]` | A utility `bg` with an input that is arbitrary, denoted by `[…]` | | Arbitrary variable | `bg-(--my-color)` | A utility `bg` with an input that is arbitrary and has a CSS variable shorthand, denoted by `(--…)` | | Arbitrary property | `[color:red]` | A utility that sets a property to a value on the fly | A similar structure exist for modifiers, where each modifier must start with `/`: | Name | Example | Description | | ------------------ | --------------------------- | ---------------------------------------- | | Named modifier | bg-red-500`/20` | A named modifier | | Arbitrary value | bg-red-500`/[20%]` | An arbitrary value, denoted by `/[…]` | | Arbitrary variable | bg-red-500`/(--my-opacity)` | An arbitrary variable, denoted by `/(…)` | Last but not least, we have variants. They have a very similar pattern but they _must_ end in a `:`. | Name | Example | Description | | ------------------ | --------------------------- | ------------------------------------------------------------------------ | | Named variant | `hover:` | A named variant | | Arbitrary value | `data-[state=pending]:` | An arbitrary value, denoted by `[…]` | | Arbitrary variable | `supports-(--my-variable):` | An arbitrary variable, denoted by `(…)` | | Arbitrary variant | `[@media(pointer:fine)]:` | Similar to arbitrary properties, this will generate a variant on the fly | The goal with the new extractor is to encode these separate patterns in dedicated pieces of code (we called them "machines" because they are mostly state machine based and because I've been watching Person of Interest but I digress). This will allow us to focus on each pattern separately, so if there is a bug or some new syntax we want to support we can add it to those machines. One nice benefit of this is that we can encode the rules and handle validation as we go. The moment we know that some pattern is invalid, we can bail out early. At the time of writing this, there are a bunch of machines: <details> <summary>Overview of the machines</summary> - `ArbitraryPropertyMachine` Extracts candidates such as `[color:red]`. Some of the rules are: 1. There must be a property name 2. There must be a `:` 3. There must ba a value There cannot be any spaces, the brackets are included, if the property is a CSS variable, it must be a valid CSS variable (uses the `CssVariableMachine`). ``` [color:red] ^^^^^^^^^^^ [--my-color:red] ^^^^^^^^^^^^^^^^ ``` Depends on the `StringMachine` and `CssVariableMachine`. - `ArbitraryValueMachine` Extracts arbitrary values for utilities and modifiers including the brackets: ``` bg-[#0088cc] ^^^^^^^^^ bg-red-500/[20%] ^^^^^ ``` Depends on the `StringMachine`. - `ArbitraryVariableMachine` Extracts arbitrary variables including the parentheses. The first argument must be a valid CSS variable, the other arguments are optional fallback arguments. ``` (--my-value) ^^^^^^^^^^^^ bg-red-500/(--my-opacity) ^^^^^^^^^^^^^^ ``` Depends on the `StringMachine` and `CssVariableMachine`. - `CandidateMachine` Uses the variant machine and utility machine. It will make sure that 0 or more variants are directly touching and followed by a utility. ``` hover:focus:flex ^^^^^^^^^^^^^^^^ aria-invalid:bg-red-500/(--my-opacity) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``` Depends on the `VariantMachine` and `UtilityMachine`. - `CssVariableMachine` Extracts CSS variables, they must start with `--` and must contain at least one alphanumeric character or, `-`, `_` and can contain any escaped character (except for whitespace). ``` bg-(--my-color) ^^^^^^^^^^ bg-red-500/(--my-opacity) ^^^^^^^^^^^^ bg-(--my-color)/(--my-opacity) ^^^^^^^^^^ ^^^^^^^^^^^^ ``` - `ModifierMachine` Extracts modifiers including the `/` - `/[` will delegate to the `ArbitraryValueMachine` - `/(` will delegate to the `ArbitraryVariableMachine` ``` bg-red-500/20 ^^^ bg-red-500/[20%] ^^^^^^ bg-red-500/(--my-opacity) ^^^^^^^^^^^^^^^ ``` Depends on the `ArbitraryValueMachine` and `ArbitraryVariableMachine`. - `NamedUtilityMachine` Extracts named utilities regardless of whether they are functional or static. ``` flex ^^^^ px-2.5 ^^^^^^ ``` This includes rules like: A `.` must be surrounded by digits. Depends on the `ArbitraryValueMachine` and `ArbitraryVariableMachine`. - `NamedVariantMachine` Extracts named variants regardless of whether they are functional or static. This is very similar to the `NamedUtilityMachine` but with different rules. We could combine them, but splitting things up makes it easier to reason about. Another rule is that the `:` must be included. ``` hover:flex ^^^^^^ data-[state=pending]:flex ^^^^^^^^^^^^^^^^^^^^^ supports-(--my-variable):flex ^^^^^^^^^^^^^^^^^^^^^^^^^ ``` Depends on the `ArbitraryVariableMachine`, `ArbitraryValueMachine`, and `ModifierMachine`. - `StringMachine` This is a low-level machine that is used by various other machines. The only job this has is to extract strings that start with double quotes, single quotes or backticks. We have this because once you are in a string, we don't have to make sure that brackets, parens and curlies are properly balanced. We have to make sure that balancing brackets are properly handled in other machines. ``` content-["Hello_World!"] ^^^^^^^^^^^^^^ bg-[url("https://example.com")] ^^^^^^^^^^^^^^^^^^^^^ ``` - `UtilityMachine` Extracts utilities, it will use the lower level `NamedUtilityMachine`, `ArbitraryPropertyMachine` and `ModifierMachine` to extract the utility. It will also handle important markers (including the legacy important marker). ``` flex ^^^^ bg-red-500/20 ^^^^^^^^^^^^^ !bg-red-500/20 Legacy important marker ^^^^^^^^^^^^^^ bg-red-500/20! New important marker ^^^^^^^^^^^^^^ !bg-red-500/20! Both, but this is considered invalid ^^^^^^^^^^^^^^^ ``` Depends on the `ArbitraryPropertyMachine`, `NamedUtilityMachine`, and `ModifierMachine`. - `VariantMachine` Extracts variants, it will use the lower level `NamedVariantMachine` and `ArbitraryValueMachine` to extract the variant. ``` hover:focus:flex ^^^^^^ ^^^^^^ ``` Depends on the `NamedVariantMachine` and `ArbitraryValueMachine`. </details> One important thing to know here is that each machine runs to completion. They all implement a `Machine` trait that has a `next(cursor)` method and returns a `MachineState`. The `MachineState` looks like this: ```rs enum MachineState { Idle, Done(Span) } ``` Where a `Span` is just the location in the input where the candidate was found. ```rs struct Span { pub start: usize, pub end: usize, } ``` #### Complexities **Boundary characters:** When running these machines to completion, they don't typically check for boundary characters, the wrapping `CandidateMachine` will check for boundary characters. A boundary character is where we know that even though the character is touching the candidate it will not be part of the candidate. ```html <div class="flex"></div> <!-- ^ ^ --> ``` The quotes are touching the candidate `flex`, but they will not be part of the candidate itself, so this is considered a valid candidate. **What to pick?** Let's imagine you are parsing this input: ```html <div class="hover:flex"></div> ``` The `UtilityMachine` will find `hover` and `flex`. The `VariantMachine` will find `hover:`. This means that at a certain point in the `CandidateMachine` you will see something like this: ```rs let variant_machine_state = variant_machine.next(cursor); // MachineState::Done(Span { start: 12, end: 17 }) // `hover:` let utility_machine_state = utility_machine.next(cursor); // MachineState::Done(Span { start: 12, end: 16 }) // `hover` ``` They are both done, but which one do we pick? In this scenario we will always pick the variant because its range will always be 1 character longer than the utility. Of course there is an exception to this rule and it has to do with the fact that Tailwind CSS can be used in different languages and frameworks. A lot of people use `clsx` for dynamically applying classes to their React components. E.g.: ```tsx <div class={clsx({ underline: someCondition(), })} ></div> ``` In this scenario, we will see `underline:` as a variant, and `underline` as a utility. We will pick the utility in this scenario because the next character is whitespace so this will never be a valid candidate otherwise (variants and utilities must be touching). Another reason this is valid, is because there wasn't a variant present prior to this candidate. E.g.: ```tsx <div class={clsx({ hover:underline: someCondition(), })} ></div> ``` This will be considered invalid, if you do want this, you should use quotes. E.g.: ```tsx <div class={clsx({ 'hover:underline': someCondition(), })} ></div> ``` **Overlapping/covered spans:** Another complexity is that the extracted spans for candidates can and will overlap. Let's take a look at this C# example: ```csharp public enum StackSpacing { [CssClass("gap-y-4")] Small, [CssClass("gap-y-6")] Medium, [CssClass("gap-y-8")] Large } ``` In this scenario, `[CssClass("gap-y-4")]` starts with a `[` so we have a few options here: 1. It is an arbitrary property, e.g.: `[color:red]` 2. It is an arbitrary variant, e.g.: `[@media(pointer:fine)]:` When running the parsers, both the `VariantMachine` and the `UtilityMachine` will run to completion but end up in a `MachineState::Idle` state. - This is because it is not a valid variant because it didn't end with a `:`. - It's also not a valid arbitrary property, because it didn't include a `:` to separate the property from the value. Looking at the code as a human it's very clear what this is supposed to be, but not from the individual machines perspective. Obviously we want to extract the `gap-y-*` classes here. To solve this problem, we will run over an additional slice of the input, starting at the position before the machines started parsing until the position where the machines stopped parsing. That slice will be this one: `[CssClass("gap-y-6")]` (we already skipped over the whitespace). Now, for every `[` character we see, will start a new `CandidateMachine` right after the `[`'s position and run the machines over that slice. This will now eventually extract the `gap-y-6` class. The next question is, what if there was a `:` (e.g.: `[CssClass("gap-y-6")]:`), then the `VariantMachine` would complete, but the `UtilityMachine` will not because not exists after it. We will apply the same idea in this case. Another issue is if we _do_ have actual overlapping ranges. E.g.: `let classes = ['[color:red]'];`. This will extract both the `[color:red]` and `color:red` classes. You have to use your imagination, but the last one has the exact same structure as `hover:flex` (variant + utility). In this case we will make sure to drop spans that are covered by other spans. The extracted `Span`s will be valid candidates therefore if the outer most candidate is valid, we can throw away the inner candidate. ``` Position: 11112222222 67890123456 ↓↓↓↓↓↓↓↓↓↓↓ Span { start: 17, end: 25 } // color:red Span { start: 16, end: 26 } // [color:red] ``` #### Exceptions **JavaScript keys as candidates:** We already talked about the `clsx` scenario, but there are a few more exceptions and that has to do with different syntaxes. **CSS class shorthand in certain templating languages:** In Pug and Slim, you can have a syntax like this: ```pug .flex.underline div Hello World ``` <details> <summary>Generated HTML</summary> ```html <div class="flex underline"> <div>Hello World</div> </div> ``` </details> We have to make sure that in these scenarios the `.` is a valid boundary character. For this, we introduce a pre-processing step to massage the input a little bit to improve the extraction of the data. We have to make sure we don't make the input smaller or longer otherwise the positions might be off. In this scenario, we could simply replace the `.` with a space. But of course, there are scenarios in these languages where it's not safe to do that. If you want to use `px-2.5` with this syntax, then you'd write: ```pug .flex.px-2.5 div Hello World ``` But that's invalid because that technically means `flex`, `px-2`, and `5` as classes. You can use this syntax to get around that: ```pug div(class="px-2.5") div Hello World ``` <details> <summary>Generated HTML</summary> ```html <div class="px-2.5"> <div>Hello World</div> </div> ``` </details> Which means that we can't simply replace `.` with a space, but have to parse the input. Luckily we only care about strings (and we have a `StringMachine` for that) and ignore replacing `.` inside of strings. **Ruby's weird string syntax:** ```ruby %w[flex underline] ``` This is valid syntax and is shorthand for: ```ruby ["flex", "underline"] ``` Luckily this problem is solved by the running the sub-machines after each `[` character. ### Performance **Testing:** Each machine has a `test_…_performance` test (that is ignored by default) that allows you to test the throughput of that machine. If you want to run them, you can use the following command: ```sh cargo test test_variant_machine_performance --release -- --ignored ``` This will run the test in release mode and allows you to run the ignored test. > [!CAUTION] > This test **_will_** fail, but it will print some output. E.g.: ``` tailwindcss_oxide::extractor::variant_machine::VariantMachine: Throughput: 737.75 MB/s over 0.02s tailwindcss_oxide::extractor::variant_machine::VariantMachine: Duration: 500ns ``` **Readability:** One thing to note when looking at the code is that it's not always written in the cleanest way but we had to make some sacrifices for performance reasons. The `input` is of type `&[u8]`, so we are already dealing with bytes. Luckily, Rust has some nice ergonomics to easily write `b'['` instead of `0x5b`. A concrete example where we had to sacrifice readability is the state machines where we check the `previous`, `current` and `next` character to make decisions. For a named utility one of the rules is that a `.` must be preceded by and followed by a digit. This can be written as: ```rs match (cursor.prev, cursor.curr, cursor.next) { (b'0'..=b'9', b'.', b'0'..=b'9') => { /* … */ } _ => { /* … */ } } ``` But this is not very fast because Rust can't optimize the match statement very well, especially because we are dealing with tuples containing 3 values and each value is a `u8`. To solve this we use some nesting, once we reach `b'.'` only then will we check for the previous and next characters. We will also early return in most places. If the previous character is not a digit, there is no need to check the next character. **Classification and jump tables:** Another optimization we did is to classify the characters into a much smaller `enum` such that Rust _can_ optimize all `match` arms and create some jump tables behind the scenes. E.g.: ```rs #[derive(Debug, Clone, Copy, PartialEq)] enum Class { /// ', ", or ` Quote, /// \ Escape, /// Whitespace characters Whitespace, Other, } const CLASS_TABLE: [Class; 256] = { let mut table = [Class::Other; 256]; macro_rules! set { ($class:expr, $($byte:expr),+ $(,)?) => { $(table[$byte as usize] = $class;)+ }; } set!(Class::Quote, b'"', b'\'', b'`'); set!(Class::Escape, b'\\'); set!(Class::Whitespace, b' ', b'\t', b'\n', b'\r', b'\x0C'); table }; ``` There are only 4 values in this enum, so Rust can optimize this very well. The `CLASS_TABLE` is generated at compile time and must be exactly 256 elements long to fit all `u8` values. **Inlining**: Last but not least, sometimes we use functions to abstract some logic. Luckily Rust will optimize and inline most of the functions automatically. In some scenarios, explicitly adding a `#[inline(always)]` improves performance, sometimes it doesn't improve it at all. You might notice that in some functions the annotation is added and in some it's not. Every state machine was tested on its own and whenever the performance was better with the annotation, it was added. ### Test Plan 1. Each machine has a dedicated set of tests to try and extract the relevant part for that machine. Most machines don't even check boundary characters or try to extract nested candidates. So keep that in mind when adding new tests. Extracting inside of nested `[…]` is only handled by the outer most `extractor/mod.rs`. 2. The main `extractor/mod.rs` has dedicated tests for recent bug reports related to missing candidates. 3. You can test each machine's performance if you want to. There is a chance that this new parser is missing candidates even though a lot of tests are added and existing tests have been ported. To double check, we ran the new extractor on our own projects to make sure we didn't miss anything obvious. #### Tailwind UI On Tailwind UI the diff looks like this: <details> <summary>diff</summary> ```diff diff --git a/./main.css b/./pr.css index d83b0a506..b3dd94a1d 100644 --- a/./main.css +++ b/./pr.css @@ -5576,9 +5576,6 @@ @layer utilities { --tw-saturate: saturate(0%); filter: var(--tw-blur,) var(--tw-brightness,) var(--tw-contrast,) var(--tw-grayscale,) var(--tw-hue-rotate,) var(--tw-invert,) var(--tw-saturate,) var(--tw-sepia,) var(--tw-drop-shadow,); } - .\!filter { - filter: var(--tw-blur,) var(--tw-brightness,) var(--tw-contrast,) var(--tw-grayscale,) var(--tw-hue-rotate,) var(--tw-invert,) var(--tw-saturate,) var(--tw-sepia,) var(--tw-drop-shadow,) !important; - } .filter { filter: var(--tw-blur,) var(--tw-brightness,) var(--tw-contrast,) var(--tw-grayscale,) var(--tw-hue-rotate,) var(--tw-invert,) var(--tw-saturate,) var(--tw-sepia,) var(--tw-drop-shadow,); } ``` </details> The reason `!filter` is gone, is because it was used like this: ```js getProducts.js 23: if (!filter) return true ``` And right now `(` and `)` are not considered valid boundary characters for a candidate. #### Catalyst On Catalyst, the diff looks like this: <details> <summary>diff</summary> ```diff diff --git a/./main.css b/./pr.css index 9f8ed129..4aec992e 100644 --- a/./main.css +++ b/./pr.css @@ -2105,9 +2105,6 @@ .outline-transparent { outline-color: transparent; } - .filter { - filter: var(--tw-blur,) var(--tw-brightness,) var(--tw-contrast,) var(--tw-grayscale,) var(--tw-hue-rotate,) var(--tw-invert,) var(--tw-saturate,) var(--tw-sepia,) var(--tw-drop-shadow,); - } .backdrop-blur-\[6px\] { --tw-backdrop-blur: blur(6px); -webkit-backdrop-filter: var(--tw-backdrop-blur,) var(--tw-backdrop-brightness,) var(--tw-backdrop-contrast,) var(--tw-backdrop-grayscale,) var(--tw-backdrop-hue-rotate,) var(--tw-backdrop-invert,) var(--tw-backdrop-opacity,) var(--tw-backdrop-saturate,) var(--tw-backdrop-sepia,); @@ -7141,46 +7138,6 @@ inherits: false; initial-value: solid; } -@Property --tw-blur { - syntax: "*"; - inherits: false; -} -@Property --tw-brightness { - syntax: "*"; - inherits: false; -} -@Property --tw-contrast { - syntax: "*"; - inherits: false; -} -@Property --tw-grayscale { - syntax: "*"; - inherits: false; -} -@Property --tw-hue-rotate { - syntax: "*"; - inherits: false; -} -@Property --tw-invert { - syntax: "*"; - inherits: false; -} -@Property --tw-opacity { - syntax: "*"; - inherits: false; -} -@Property --tw-saturate { - syntax: "*"; - inherits: false; -} -@Property --tw-sepia { - syntax: "*"; - inherits: false; -} -@Property --tw-drop-shadow { - syntax: "*"; - inherits: false; -} @Property --tw-backdrop-blur { syntax: "*"; inherits: false; ``` </details> The reason for this is that `filter` was only used as a function call: ```tsx src/app/docs/Code.tsx 31: .filter((x) => x !== null) ``` This was tested on all templates and they all remove a very small amount of classes that aren't used. The script to test this looks like this: ```sh bun --bun ~/github.com/tailwindlabs/tailwindcss/packages/@tailwindcss-cli/src/index.t -- -i ./src/styles/tailwind.css -o pr.css bun --bun ~/github.com/tailwindlabs/tailwindcss--main/packages/@tailwindcss-cli/src/index.t -- -i ./src/styles/tailwind.css -o main.css git diff --no-index --patch ./{main,pr}.css ``` This is using git worktrees, so the `pr` branch lives in a `tailwindcss` folder, and the `main` branch lives in a `tailwindcss--main` folder. --- ### Fixes: - Fixes: #15616 - Fixes: #16750 - Fixes: #16790 - Fixes: #16801 - Fixes: #16880 (due to validating the arbitrary property) --- ### Ideas for in the future 1. Right now each machine takes in a `Cursor` object. One potential improvement we can make is to rely on the `input` on its own instead of going via the wrapping `Cursor` object. 2. If you take a look at the AST, you'll notice that utilities and variants have a "root", these are basically prefixes of each available utility and/or variant. We can use this information to filter out candidates and bail out early if we know that a certain candidate will never produce a valid class. 3. Passthrough the `prefix` information. Everything that doesn't start with `tw:` can be skipped. ### Design decisions that didn't make it Once you reach this part, you can stop reading if you want to, but this is more like a brain dump of the things we tried and didn't work out. Wanted to include them as a reference in case we want to look back at this issue and know _why_ certain things are implemented the way they are. #### One character at a time In an earlier implementation, the state machines were pure state machines where the `next()` function was called on every single character of the input. This had a lot of overhead because for every character we had to: 1. Ask the `CandidateMachine` which state it was in. 2. Check the `cursor.curr` (and potentially the `cursor.prev` and `cursor.next`) character. 3. If we were in a state where a nested state machine was running, we had to check its current state as well and so on. 4. Once we did all of that we could go to the next character. In this approach, the `MachineState` looked like this instead: ```rs enum MachineState { Idle, Parsing, Done(Span) } ``` This had its own set of problems because now it's very hard to know whether we are done or not. ```html <div class="hover:flex"></div> <!-- ^ --> ``` Let's look at the current position in the example above. At this point, it's both a valid variant and valid utility, so there was a lot of additional state we had to track to know whether we were done or not. #### `Span` stitching Another approach we tried was to just collect all valid variants and utilities and throw them in a big `Vec<Span>`. This reduced the amount of additional state to track and we could track a span the moment we saw a `MachineState::Done(span)`. The next thing we had to do was to make sure that: 1. Covered spans were removed. We still do this part in the current implementation. 2. Combine all touching variant spans (where `span_a.end + 1 == span_b.start`). 3. For every combined variant span, find a corresponding utility span. - If there is no utility span, the candidate is invalid. - If there are multiple candidate spans (this is in theory not possible because we dropped covered spans) - If there is a candidate _but_ it is attached to another set of spans, then the candidate is invalid. E.g.: `flex!block` 4. All left-over utility spans are candidates without variants. This approach was slow, and still a bit hard to reason about. #### Matching on tuples While matching against the `prev`, `curr` and `next` characters was very readable and easy to reason about. It was not very fast. Unfortunately had to abandon this approach in favor of a more optimized approach. In a perfect world, we would still write it this way, but have some compile time macro that would optimize this for us. #### Matching against `b'…'` instead of classification and jump tables Similar to the previous point, while this is better for readability, it's not fast enough. The jump tables are much faster. Luckily for us, each machine has it's own set of rules and context, so it's much easier to reason about a single problem and optimize a single machine. [^candidate]: A candidate is what a potential Tailwind CSS class _could_ be. It's a candidate because at this stage we don't know if it will actually produce something but it looks like it could be a valid class. E.g.: `hover:bg-red-500` is a candidate, but it will only produce something if `--color-red-500` is defined in your theme. --------- Co-authored-by: Jordan Pittman <jordan@cryptica.me> Co-authored-by: Philipp Spiess <hello@philippspiess.com>
1 parent 781fb73 commit b3c2556

33 files changed

+5685
-1861
lines changed

CHANGELOG.md

+1
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1515
- _Experimental_: Add `user-valid` and `user-invalid` variants ([#12370](https://github.com/tailwindlabs/tailwindcss/pull/12370))
1616
- _Experimental_: Add `wrap-anywhere`, `wrap-break-word`, and `wrap-normal` utilities ([#12128](https://github.com/tailwindlabs/tailwindcss/pull/12128))
1717
- Add `col-<number>` and `row-<number>` utilities for `grid-column` and `grid-row` ([#15183](https://github.com/tailwindlabs/tailwindcss/pull/15183))
18+
- Add new candidate extractor ([#16306](https://github.com/tailwindlabs/tailwindcss/pull/16306))
1819

1920
### Fixed
2021

Cargo.lock

+13-13
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

crates/node/Cargo.toml

+4-4
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,10 @@ crate-type = ["cdylib"]
88

99
[dependencies]
1010
# Default enable napi4 feature, see https://nodejs.org/api/n-api.html#node-api-version-matrix
11-
napi = { version = "2.16.11", default-features = false, features = ["napi4"] }
12-
napi-derive = "2.16.12"
11+
napi = { version = "2.16.16", default-features = false, features = ["napi4"] }
12+
napi-derive = "2.16.13"
1313
tailwindcss-oxide = { path = "../oxide" }
14-
rayon = "1.5.3"
14+
rayon = "1.10.0"
1515

1616
[build-dependencies]
17-
napi-build = "2.0.1"
17+
napi-build = "2.1.4"

crates/node/src/lib.rs

+15-4
Original file line numberDiff line numberDiff line change
@@ -28,12 +28,23 @@ pub struct GlobEntry {
2828
pub pattern: String,
2929
}
3030

31-
impl From<ChangedContent> for tailwindcss_oxide::ChangedContent {
31+
impl From<ChangedContent> for tailwindcss_oxide::ChangedContent<'_> {
3232
fn from(changed_content: ChangedContent) -> Self {
33-
Self {
34-
file: changed_content.file.map(Into::into),
35-
content: changed_content.content,
33+
if let Some(file) = changed_content.file {
34+
return tailwindcss_oxide::ChangedContent::File(
35+
file.into(),
36+
changed_content.extension.into(),
37+
);
38+
}
39+
40+
if let Some(contents) = changed_content.content {
41+
return tailwindcss_oxide::ChangedContent::Content(
42+
contents,
43+
changed_content.extension.into(),
44+
);
3645
}
46+
47+
unreachable!()
3748
}
3849
}
3950

crates/oxide/Cargo.toml

+3-2
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,11 @@ version = "0.1.0"
44
edition = "2021"
55

66
[dependencies]
7-
bstr = "1.10.0"
7+
bstr = "1.11.3"
88
globwalk = "0.9.1"
99
log = "0.4.22"
1010
rayon = "1.10.0"
11-
fxhash = { package = "rustc-hash", version = "2.0.0" }
11+
fxhash = { package = "rustc-hash", version = "2.1.1" }
1212
crossbeam = "0.8.4"
1313
tracing = { version = "0.1.40", features = [] }
1414
tracing-subscriber = { version = "0.3.18", features = ["env-filter"] }
@@ -20,3 +20,4 @@ fast-glob = "0.4.3"
2020

2121
[dev-dependencies]
2222
tempfile = "3.13.0"
23+

crates/oxide/src/cursor.rs

+27-27
Original file line numberDiff line numberDiff line change
@@ -41,14 +41,34 @@ impl<'a> Cursor<'a> {
4141
cursor
4242
}
4343

44-
pub fn rewind_by(&mut self, amount: usize) {
45-
self.move_to(self.pos.saturating_sub(amount));
46-
}
47-
4844
pub fn advance_by(&mut self, amount: usize) {
4945
self.move_to(self.pos.saturating_add(amount));
5046
}
5147

48+
#[inline(always)]
49+
pub fn advance(&mut self) {
50+
self.pos += 1;
51+
52+
self.prev = self.curr;
53+
self.curr = self.next;
54+
self.next = *self
55+
.input
56+
.get(self.pos.saturating_add(1))
57+
.unwrap_or(&0x00u8);
58+
}
59+
60+
#[inline(always)]
61+
pub fn advance_twice(&mut self) {
62+
self.pos += 2;
63+
64+
self.prev = self.next;
65+
self.curr = *self.input.get(self.pos).unwrap_or(&0x00u8);
66+
self.next = *self
67+
.input
68+
.get(self.pos.saturating_add(1))
69+
.unwrap_or(&0x00u8);
70+
}
71+
5272
pub fn move_to(&mut self, pos: usize) {
5373
let len = self.input.len();
5474
let pos = pos.clamp(0, len);
@@ -57,13 +77,9 @@ impl<'a> Cursor<'a> {
5777
self.at_start = pos == 0;
5878
self.at_end = pos + 1 >= len;
5979

60-
self.prev = if pos > 0 { self.input[pos - 1] } else { 0x00 };
61-
self.curr = if pos < len { self.input[pos] } else { 0x00 };
62-
self.next = if pos + 1 < len {
63-
self.input[pos + 1]
64-
} else {
65-
0x00
66-
};
80+
self.prev = *self.input.get(pos.wrapping_sub(1)).unwrap_or(&0x00u8);
81+
self.curr = *self.input.get(pos).unwrap_or(&0x00u8);
82+
self.next = *self.input.get(pos.saturating_add(1)).unwrap_or(&0x00u8);
6783
}
6884
}
6985

@@ -139,21 +155,5 @@ mod test {
139155
assert_eq!(cursor.prev, b'd');
140156
assert_eq!(cursor.curr, 0x00);
141157
assert_eq!(cursor.next, 0x00);
142-
143-
cursor.rewind_by(1);
144-
assert_eq!(cursor.pos, 10);
145-
assert!(!cursor.at_start);
146-
assert!(cursor.at_end);
147-
assert_eq!(cursor.prev, b'l');
148-
assert_eq!(cursor.curr, b'd');
149-
assert_eq!(cursor.next, 0x00);
150-
151-
cursor.rewind_by(10);
152-
assert_eq!(cursor.pos, 0);
153-
assert!(cursor.at_start);
154-
assert!(!cursor.at_end);
155-
assert_eq!(cursor.prev, 0x00);
156-
assert_eq!(cursor.curr, b'h');
157-
assert_eq!(cursor.next, b'e');
158158
}
159159
}

0 commit comments

Comments
 (0)