-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Goal
Add RE2 as the preferred regex engine when available, with fallbacks:
- RE2 (when available and pattern-compatible)
- Boost.Regex (when available)
- std::regex (final fallback)
Expected benefits of RE2
- Performance – Typically 5–10× faster than
std::regexfor complex patterns and long text (DFA/NFA hybrid, no backtracking). - Predictable complexity – Linear time O(n) in input size; no exponential blow-up on pathological patterns.
- ReDoS mitigation – Avoids catastrophic backtracking, reducing risk of regex denial-of-service from user or API-supplied patterns.
- Thread safety – RE2 objects are safe to share across threads; no extra locking in the regex layer.
- Bounded memory – Configurable limits; no unbounded memory growth during matching.
- Production use – Widely used in production; BSD-3-Clause license.
Patterns that need lookahead/lookbehind or backreferences will continue to use Boost or std::regex via the fallback path.
Approach (Option B – package managers)
- macOS: Homebrew (
brew install re2) - Linux: vcpkg (
vcpkg install re2) alongside existing Boost - Windows: initially no RE2; optional vcpkg later for parity
Implementation order
Benchmark strengthening (before RE2)
- Regex microbenchmark – New
regex_benchmarkexecutable to measure compile + match time for literal, simple, and complex patterns on filename/content corpora. Establish baseline with current std/Boost implementation. - SearchBenchmark extensions – Add
--regex-engine=auto|re2|boost|stdand regex-heavy reference configs to measure end-to-end impact.
RE2 integration
- Build plumbing – CMake detection (
find_package(re2 CONFIG QUIET)),HAVE_RE2/RE2_REGEX_AVAILABLE, CI steps to install RE2 on macOS and Linux. No runtime behavior change. - Unified regex wrapper – Single API with engine priority (RE2 → Boost → std), pattern checks for RE2-unsupported features (lookahead/lookbehind, backrefs), caching.
- Tests & benchmarks – Unit tests for selection/fallback; re-run
regex_benchmarkandsearch_benchmarkto quantify improvement. - Windows RE2 (optional) – vcpkg on Windows, PGO rules for RE2 target.
- Documentation – Engine order, CMake options, pattern limitations, how to run benchmarks.
References
- RE2 feasibility and pattern fallbacks:
internal-docs/archive/RE2_FEASIBILITY_STUDY.md(in main repo) - Implementation phases and benchmark plan:
internal-docs/plans/2026-03-15_RE2_AND_BENCHMARK_PHASES.md(in main repo)
This issue is a placeholder for tracking the above work; no code changes required until implementation starts.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request