Drastic performance increases in the CSS parser #68

slusarz · 2013-09-23T05:32:26Z

Recently we were alerted of extremely slow performance of the CSS parser:

http://lists.horde.org/archives/imp/Week-of-Mon-20130916/055480.html

I couldn't reproduce 40 second runtimes, but there was a definite lag. Running a xdebug() profiler produced an almost 1 GB output, with certain internal CSS parsing functions being called 1,000,000 times(!). And slow functions like substr() and strtolower() were being run 400,000 times each.

Making a quick pass through the parser, the low-hanging fruit was the code dealing with comparisons on a single byte. In almost every case, consume(), peek() and comes() is absolute overkill when scanning for a single byte that is known NOT to be a multibyte token.

Just changing these calls to a much simpler currentByte() (just grabbing the string index at the current position) and consumeByte() (just incrementing the current pointer) slashed the runtime by more than 50%. Rough example: the xdebug trace file dropped from about 968 MB to 436 MB.

The parser should really be rewritten into a stream parser/tokenizer to achieve maximum speed, but this at least is a good start for very simple changes.

sabberworm · 2013-09-24T06:21:41Z

Thanks for this. It’s a change I’ve thought about making as well. Initially, I’d wanted the parser to work with charsets that don’t have the ASCII characters on a single byte (like UCS-2) but since this would need some additional work either way (and no-one, in the three years that this project has existed, has requested support for this), I’m willing to drop that plan and go with your solution.

I know this parser is no race horse and but I’ve never had any applications that needed to be this performant and I’ve always cached the output of expensive parsing operations.

slusarz · 2013-09-24T06:39:51Z

Well, if this ever becomes an issue (i.e. UCS-2), the easiest solution will be just to convert the charset of the CSS string to UTF-8 before beginning parsing. That would eliminate any issue with ASCII parsing characters.

But as you point out, as a practical matter is anyone really using anything other than UTF-8 (or maybe iso-8859-1) for a style sheet. So optimizing for practical uses rather than theoretical makes much more sense.

sabberworm · 2013-09-24T06:48:14Z

lib/Sabberworm/CSS/Parser.php

 			$this->consumeWhiteSpace();
 			$this->parseList($oResult);
 			return $oResult;
 		} else if ($sIdentifier === 'namespace') {
 			$sPrefix = null;
 			$mUrl = $this->parsePrimitiveValue();
-			if (!$this->comes(';')) {
+			if (!$this->currentByte() == ';') {


!$this->currentByte() is evaluated before == ';' so this is NOT the same thing as !$this->comes(';')

sabberworm · 2013-09-24T07:08:09Z

I’ve added a few notes but even when I do change these things, the unit tests fail for files that contain utf-8 encoded characters inside string literals or comments (the offending files are unicode.css and functions.css in the tests/files/ directory. So now, somewhere a single byte gets consumed when it should be a multibyte character. It’s probably my fault as I am using the byte-based $this->iCurrentPosition directly for mb_substr and mb_strpos (instead of truncating the string first using byte-level functions and then passing them to the multibyte functions), but I don’t have the time right now to investigate further so I’ll have to put this on hold for now (unless you want to do some digging).

slusarz · 2013-09-25T07:36:37Z

Closing. Starting over with a lesser scope in a different branch.

Drastic performance increases in the CSS parser

c5fecaa

sabberworm reviewed Sep 24, 2013
View reviewed changes

slusarz closed this Sep 25, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Drastic performance increases in the CSS parser #68

Drastic performance increases in the CSS parser #68

Uh oh!

slusarz commented Sep 23, 2013

Uh oh!

sabberworm commented Sep 24, 2013

Uh oh!

slusarz commented Sep 24, 2013

Uh oh!

sabberworm Sep 24, 2013

Uh oh!

sabberworm commented Sep 24, 2013

Uh oh!

slusarz commented Sep 25, 2013

Uh oh!

Uh oh!

Drastic performance increases in the CSS parser #68

Drastic performance increases in the CSS parser #68

Uh oh!

Conversation

slusarz commented Sep 23, 2013

Uh oh!

sabberworm commented Sep 24, 2013

Uh oh!

slusarz commented Sep 24, 2013

Uh oh!

sabberworm Sep 24, 2013

Choose a reason for hiding this comment

Uh oh!

sabberworm commented Sep 24, 2013

Uh oh!

slusarz commented Sep 25, 2013

Uh oh!

Uh oh!