RE#

A high-performance, automata-based regex engine with first-class support for intersection and complement operations.

RE# compiles patterns into deterministic automata. All matching is non-backtracking with guaranteed linear-time execution. RE# extends standard regex syntax with intersection (&), complement (~), and a universal wildcard (_), enabling patterns that are impossible or impractical to express with standard regex.

paper | blog post | syntax docs | dotnet version and web playground

Usage

let re = resharp::Regex::new(r".*cat.*&.*dog.*&.{8,15}").unwrap();

let matches = re.find_all(b"the cat and the dog");
let found = re.is_match(b"the cat and the dog");

Syntax extensions

RE# supports standard regex syntax plus three extensions: _ (universal wildcard), & (intersection), and ~(...) (complement). _ matches any character including newlines, so _* means "any string".

_*              any string
a_*             any string that starts with 'a'
_*a             any string that ends with 'a'
_*a_*           any string that contains 'a'
~(_*a_*)        any string that does NOT contain 'a'
(_*a_*)&~(_*b_*)  contains 'a' AND does not contain 'b'
(?<=b)_*&_*(?=a)  preceded by 'b' AND followed by 'a'

You combine all of these with & to get more complex patterns. RE# also supports lookarounds ((?=...), (?<=...), (?!...), (?<!...)), compiled directly into the automaton with no backtracking. See the full syntax reference for details.

Differences from `resharp-dotnet` RE# and rust `regex`

Rust resharp is written from scratch, there are a number of differences from the original. For starters this works on byte slices (&[u8]) and UTF-8 rather than UTF-16. The parser uses the regex-syntax crate as a base with 3 extensions described above. The API is also different, there's different internal representation for the characters and algebra.. etc

While this version is more mature, given it's the ~~second~~ fifth reimplementation, I haven't had that much time to optimize the match loop for specific patterns or use cases. It will still be fast given it's a deterministic automata based engine, but it will be closer to 1GB/s rather than 15GB/s that you can get with SIMD literal optimizations and other pattern-specific optimizations. We do use memrchr for string literals, so literals will actually do 10+GB/s, but don't expect that for more complex patterns.

I have plans to add these optimizations, but it will take some time and effort to properly implement them. If you want to help me out with that I would be grateful, I need some SIMD subroutines that would work across x86-64 and arm64, bidirectionally, and ideally also support no_std environments.. perhaps one more constraint that i want them to be reasonably concise and correct, if I wanted LLM-generated code I would just ask in the first place

In its current state, when you should use this as opposed to the default regex in rust is:

you want to match patterns that require intersection or complement or lookarounds
you want to match large regexes with high performance, at the expense of memory usage
you want leftmost longest matches rather than leftmost first matches

For tuning, EngineOptions controls precompilation threshold, capacity, and lookahead context:

let opts = resharp::EngineOptions {
    dfa_threshold: 100,           // eagerly compile up to N states
    max_dfa_capacity: 65535,       // max automata states (default: u16::MAX)
    lookahead_context_max: 800,    // max lookahead context distance (default: 800)
};
let re = resharp::Regex::with_options(r"pattern", opts).unwrap();

RE# matching API is slightly different from regex, matches will return a Result<Vec<Match>, Error>, where the Error can be a capacity overflow or a lookahead context overflow. RE# will either give you fast matching or fail outright. You can catch these errors and rebuild / adjust your pattern or options accordingly.

Benchmarks

Throughput comparison with regex and fancy-regex on an AMD Ryzen 7 5800X, compiled with --release. Compile time is excluded; only matching is measured. Run with cargo bench -- 'readme/'.

Benchmark	resharp	regex	fancy-regex
dictionary 2663 words (900KB, ~15 matches)	503 MiB/s	547 MiB/s	542 MiB/s
dictionary 2663 words (944KB, ~2678 matches)	424 MiB/s	57 MiB/s	20 MiB/s
dictionary `(?i)` 2663 words (900KB)	505 MiB/s	0.03 MiB/s	0.03 MiB/s
lookaround `(?<=\s)[A-Z][a-z]+(?=\s)` (900KB)	265 MiB/s	--	25 MiB/s
literal alternation (900KB)	490 MiB/s	11.3 GiB/s	10.0 GiB/s
literal `"Sherlock Holmes"` (900KB)	13.8 GiB/s	39.0 GiB/s	32.7 GiB/s

Notes on the results:

The first dictionary row is roughly tied - the prose haystack only contains ~15 matches, so the lazy DFA barely explores any states. RE#'s advantage is that its full DFA is smaller, but this isn't visible when most states are never materialized. On longer inputs or denser matches, the other engines will degrade - take lazy-dfa benchmarks with a grain of salt, you will not be matching the exact same string over and over in the real world.
The (?i) row shows what happens when the pattern forces regex to fall back from its DFA to an NFA: throughput drops from 547 MiB/s to 0.03 MiB/s. RE# handles case folding in the DFA and maintains full speed. You can increase regex's DFA threshold to avoid this fallback, but only up to a point.
RE# compiles lookarounds directly into the automaton - no back-and-forth between forward and backward passes. regex doesn't support lookarounds except for anchors; fancy-regex handles them via backtracking, which is occasionally much slower.
Literal patterns and <20 literal alternations are a clear RE# weakness - regex uses Teddy for these, which RE# does not have. On pure literal matching, regex / fancy-regex is 20-80x faster.

Crate structure

Crate	Description
`resharp`	engine and public API `(resharp-engine)`
`resharp-algebra`	algebraic regex tree, constraint solver, nullability analysis
`resharp-parser`	pattern string to AST, extends `regex-syntax` with RE# operators

And most importantly, have fun! :)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
docs		docs
resharp-algebra		resharp-algebra
resharp-engine		resharp-engine
resharp-parser		resharp-parser
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RE#

Usage

Syntax extensions

Differences from `resharp-dotnet` RE# and rust `regex`

Benchmarks

Crate structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RE#

Usage

Syntax extensions

Differences from resharp-dotnet RE# and rust regex

Benchmarks

Crate structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Differences from `resharp-dotnet` RE# and rust `regex`

Packages