Fix string offset parameters to use character positions instead of byte offsets by Claude · Pull Request #31 · crispthinking/IronRe2

Claude · 2026-04-01T10:01:32Z

The Find(string, int) and Captures(string, int) methods were treating the offset parameter as a byte position in the UTF-8 encoded string, causing incorrect behavior with multi-byte characters (emojis, CJK characters, etc.).

Changes

Modified Find(string haystack, int offset) and Captures(string haystack, int offset): Convert character offset to byte offset using Encoding.UTF8.GetByteCount(haystack.AsSpan(0, offset)) before passing to the underlying byte-based methods
Updated XML documentation: Clarified that string-based methods accept character offsets while byte-based methods accept byte offsets
Added test coverage: 6 new tests covering multi-byte UTF-8 scenarios (4-byte emojis, 3-byte CJK characters)

Example

Regex re = new("world");
string haystack = "Hello 🌍 world";  // 🌍 is 4 bytes in UTF-8

// Character positions: H(0) e(1) l(2) l(3) o(4) (5) 🌍(6) (7) w(8)...
// Byte positions: H(0) e(1) l(2) l(3) o(4) (5) 🌍(6-9) (10) w(11)...

// Now correctly interprets offset as character position
var match = re.Find(haystack, 8);  // Starts at character 'w', not mid-emoji
Assert.Equal("world", match.ExtractedText);

- Modified Find(string, int) to convert character offset to byte offset using Encoding.UTF8.GetByteCount - Modified Captures(string, int) to convert character offset to byte offset - Updated XML documentation to clarify that string methods use character offsets, while byte methods use byte offsets - Added comprehensive tests for multi-byte UTF-8 characters (emojis and Asian characters) - All 71 tests passing (6 new tests added) Agent-Logs-Url: https://github.com/crispthinking/IronRe2/sessions/2d26941d-972d-4c97-a3ec-ad34e99c1ff3 Co-authored-by: corpo-iwillspeak <265613520+corpo-iwillspeak@users.noreply.github.com>

Initial plan

dd1884c

Claude AI assigned Claude and corpo-iwillspeak Apr 1, 2026

Claude started work on behalf of corpo-iwillspeak April 1, 2026 10:01 View session

Claude AI linked an issue Apr 1, 2026 that may be closed by this pull request

Its not clear that offsets in string haystack refer to byte offsets #13

Open

Claude AI changed the title ~~[WIP] Fix offset interpretation in string haystack~~ Fix string offset parameters to use character positions instead of byte offsets Apr 1, 2026

Claude finished work on behalf of corpo-iwillspeak April 1, 2026 10:08

Claude AI requested a review from corpo-iwillspeak April 1, 2026 10:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix string offset parameters to use character positions instead of byte offsets#31

Fix string offset parameters to use character positions instead of byte offsets#31
Claude wants to merge 2 commits intomainfrom
claude/fix-byte-offset-issue

Claude AI commented Apr 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Claude AI commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Example

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Claude AI commented Apr 1, 2026 •

edited

Loading