Skip to content

fix: validate charset flag conflicts in (?...) groups#45

Draft
toddr-bot wants to merge 1 commit intocpan-authors:mainfrom
toddr-bot:koan.toddr.bot/fix-charset-flag-conflicts
Draft

fix: validate charset flag conflicts in (?...) groups#45
toddr-bot wants to merge 1 commit intocpan-authors:mainfrom
toddr-bot:koan.toddr.bot/fix-charset-flag-conflicts

Conversation

@toddr-bot
Copy link
Copy Markdown
Collaborator

@toddr-bot toddr-bot commented Apr 14, 2026

What

Reject mutually exclusive charset flags (a/d/l/u) in (?...) flag groups, matching Perl's behavior since 5.14.

Why

Perl treats (?al:...), (?du:...), (?dd:...), and (?-a:...) as errors. The parser silently accepted all of these, producing nodes with contradictory charset semantics. Any tool using the parser to validate regexes would miss these errors.

How

Three new error codes in Diagnostics.pm, validation in the (? handler's flag-parsing loops:

  • On-flags: track which charset flag was seen; reject doubled d/l/u (RPe_DUPLCH) and any cross-charset combination (RPe_EXCLCH). aa (strict ASCII) is the sole valid doubling.
  • Off-flags: reject all charset flags after - (RPe_NEGCHR), matching Perl's "may not appear after the '-'" error.

Testing

  • 18 rejection tests covering all conflict types (doubled, exclusive, negated) including caret syntax
  • 18 acceptance tests confirming valid usage (single flags, aa, caret forms, mixed with non-charset flags)
  • Full suite: 1207 tests across 20 files, all pass

🤖 Generated with Claude Code


Quality Report

Changes: 5 files changed, 86 insertions(+)

Code scan: clean

Tests: passed (OK)

Branch hygiene: clean

Generated by Kōan post-mission quality pipeline

Perl 5.14+ treats charset flags (a/d/l/u) as mutually exclusive:
only one may be active, with 'aa' as the sole valid doubling.
The parser accepted all combinations silently, producing nodes
with conflicting charset semantics.

Three new error codes matching Perl's native errors:
- RPe_DUPLCH: doubled d/l/u (e.g. (?dd:...))
- RPe_EXCLCH: conflicting charset flags (e.g. (?al:...), (?du:...))
- RPe_NEGCHR: charset flags after - (e.g. (?-a:...))

Validated in both grouped (?xx:...) and toggle (?xx) forms,
with and without caret reset syntax.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant