Skip to content

Optimize tokenizer string scanning with pre-computed escape-end positions #68

@alanzabihi

Description

@alanzabihi

Hypothesis

When the tokenizer encounters a quoted string, it scans forward character by character checking for the closing quote and handling backslash escapes. The escape handling involves checking the next character and potentially skipping hex sequences via RE_HEX_ESCAPE regex.

For typical CSS, most strings are short (class names, font names, URLs) and contain no escapes. Add a fast path that uses indexOf to find the closing quote first. If no backslash exists between the opening and closing quote, accept the string immediately without character-by-character scanning:

let closeQuote = css.indexOf(quote === SINGLE_QUOTE ? "'" : '"', pos + 1)
if (closeQuote !== -1) {
  let backslash = css.indexOf('\\', pos + 1)
  if (backslash === -1 || backslash > closeQuote) {
    // No escape in this string — fast path
    currentToken = ['string', css.slice(pos, closeQuote + 1), pos, closeQuote]
    pos = closeQuote + 1
    return
  }
}
// Fall through to character-by-character scanning

This avoids per-character charCodeAt checks for the common no-escape case.

Editable surface

  • lib/tokenize.js — add fast-path string scanning for no-escape strings

What's different from prior work

Expected impact

METRIC_A improvement of 1-3ms (large file with string values). METRIC_B improvement of 1-2ms. Combined 2-3ms.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions