Skip to content

Validate markdown anchors in docs smoke#873

Open
yyswhsccc wants to merge 2 commits into
ramimbo:mainfrom
yyswhsccc:druid/probe-846-doc-anchor-smoke
Open

Validate markdown anchors in docs smoke#873
yyswhsccc wants to merge 2 commits into
ramimbo:mainfrom
yyswhsccc:druid/probe-846-doc-anchor-smoke

Conversation

@yyswhsccc
Copy link
Copy Markdown

@yyswhsccc yyswhsccc commented Jun 4, 2026

Summary

Refs #846.

  • extends scripts/docs_smoke.py so local Markdown links with #heading fragments verify the target heading exists;
  • keeps existing file-existence behavior for external links and non-Markdown local files;
  • adds focused tests for a valid heading anchor, a missing anchor, and inline-code heading slugging.

Validation

  • uv run --extra dev python -m pytest tests/test_docs_public_urls.py -q -> 37 passed
  • uv run --extra dev python scripts/docs_smoke.py -> docs smoke ok
  • uv run --extra dev python -m ruff check scripts/docs_smoke.py tests/test_docs_public_urls.py -> passed
  • git diff --check -> passed

Scope

Docs-smoke maintainability only. No ledger, wallet, treasury, payout, admin-token, exchange, bridge, off-ramp, price, liquidity, or production data behavior is changed.

Summary by CodeRabbit

  • Chores

    • Stricter documentation link validation: verifies local Markdown fragment targets exist, recognizes duplicate heading anchors (with suffixed numbering), ignores headings inside fenced code blocks, and normalizes inline/backticked text when resolving anchors.
  • Tests

    • Added tests covering heading-anchor validation, duplicate-anchor behavior, fenced-code exclusions, and inline-code normalization.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 6c4c4740-c92f-4de8-8ea9-5a2807b70296

📥 Commits

Reviewing files that changed from the base of the PR and between a98caed and d05a7ac.

📒 Files selected for processing (2)
  • scripts/docs_smoke.py
  • tests/test_docs_public_urls.py

📝 Walkthrough

Walkthrough

Adds heading-anchor extraction and uses it to validate local Markdown fragment links: new regexes and helpers compute normalized anchors (ignoring fenced code and suffixing duplicates), and _local_target_exists now verifies fragments against those anchors. Tests cover fragment presence, duplicates, fenced-code exclusion, and inline-code normalization.

Changes

Markdown heading anchor validation

Layer / File(s) Summary
Heading regex and fenced-code detection
scripts/docs_smoke.py
Adds HEADING_RE and FENCE_RE constants for identifying Markdown headings and fenced code blocks.
Anchor normalization and extraction
scripts/docs_smoke.py
Adds _markdown_heading_anchor() to normalize headings into anchors and _markdown_anchors() to scan a Markdown file (skipping fenced code) and return all computed anchors with duplicate suffixing.
Fragment validation in local target check
scripts/docs_smoke.py
Updates _local_target_exists() to resolve local paths, require destination existence, and when a fragment is present for Markdown files, verify the fragment matches the computed anchors; leaves http(s)/mailto: links unchanged.
Test coverage for heading anchor validation
tests/test_docs_public_urls.py
Reformats imports and adds four tests validating fragment detection, duplicate-heading suffix behavior, fenced-code exclusion from anchors, and normalization of headings with inline/backticked code.
🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Title check ✅ Passed The title "Validate markdown anchors in docs smoke" accurately describes the main change: adding markdown anchor validation to the docs smoke script.
Description check ✅ Passed The description covers the core changes, validation evidence, and scope clearly, though it uses a non-standard format that deviates from the template structure.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Mergework Public Artifact Hygiene ✅ Passed PR contains only technical code changes to documentation link validation scripts. No public artifacts modified, no prohibited claims about investment, price, cash-out, payouts, or security details.
Bounty Pr Focus ✅ Passed PR references issue #846; changes match stated files; includes four new tests; provides evidence of passing tests and checks; scope limited to docs-smoke validation only.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: b969b667-c8ce-412e-b902-d089e5938fc5

📥 Commits

Reviewing files that changed from the base of the PR and between d4d0e48 and a98caed.

📒 Files selected for processing (2)
  • scripts/docs_smoke.py
  • tests/test_docs_public_urls.py

Comment thread scripts/docs_smoke.py Outdated
Comment thread tests/test_docs_public_urls.py
Copy link
Copy Markdown
Contributor

@caozhengming caozhengming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the focused docs-smoke improvement. I reviewed current head a98caed4a8ff54428f1dff381a63cbb8c1da1d45 and the basic validation passes, but I think this needs one more round before merge.

Blocker: the new Markdown anchor validator does not match GitHub-style heading anchors in two common cases:

  • repeated headings lose the generated suffix because _markdown_anchors() stores anchors in a set, so # Same followed by ## Same only produces same and a valid GitHub fragment like #same-1 is reported missing;
  • headings inside fenced code blocks are counted as real anchors because HEADING_RE scans the whole Markdown file, so a fragment targeting a pseudo-heading inside a code block is incorrectly accepted.

Repro on this head:

anchors= ['fake', 'real', 'same']
duplicate_same_1= False
fenced_fake= True
real= True

The repro used a temp Markdown file containing:

# Same
## Same
```md
# Fake
```
# Real

Expected behavior for GitHub-compatible docs smoke is the opposite for the first two checks: docs.md#same-1 should be valid, while docs.md#fake should be invalid because it is inside a fenced block. Please add duplicate-heading suffix handling and ignore fenced code blocks before extracting heading anchors, with regression tests for both boundaries.

Validation I ran:

  • uv run --extra dev python -m pytest tests/test_docs_public_urls.py -q -> 37 passed, with only the existing pytest temp cleanup PermissionError after success on Windows.
  • uv run --extra dev python scripts/docs_smoke.py -> docs smoke ok.
  • uv run --extra dev python -m ruff check scripts/docs_smoke.py tests/test_docs_public_urls.py -> passed.
  • uv run --extra dev python -m ruff format --check scripts/docs_smoke.py tests/test_docs_public_urls.py -> 2 files already formatted.
  • git diff --check origin/main...HEAD -> clean.
  • git merge-tree --write-tree origin/main HEAD -> clean tree 39c126843f6e5935f646adc1714e18990f0fb6ce.

Scope checked: docs-smoke validation only. I did not touch ledger, wallet, treasury, payout, admin-token, exchange, bridge, off-ramp, MRWK price, private data, or secrets.

@yyswhsccc
Copy link
Copy Markdown
Author

Maintenance update for @caozhengming review:

  • Added GitHub-style duplicate heading suffix handling, so repeated headings now expose anchor, anchor-1, etc.
  • Ignored headings inside fenced Markdown/code blocks before anchor extraction, so pseudo-headings no longer satisfy fragment links.
  • Added regression tests for duplicate heading fragments and fenced-code pseudo-headings.

Validation run locally:

  • .venv-bounty-validation/bin/python -m pytest tests/test_docs_public_urls.py -q -> 39 passed
  • .venv-bounty-validation/bin/python scripts/docs_smoke.py -> docs smoke ok
  • .venv-bounty-validation/bin/python -m ruff check scripts/docs_smoke.py tests/test_docs_public_urls.py -> passed
  • .venv-bounty-validation/bin/python -m ruff format --check scripts/docs_smoke.py tests/test_docs_public_urls.py -> already formatted
  • git diff --check -> clean

Scope remains docs-smoke validation only; no ledger, wallet, treasury, payout, admin-token, exchange, bridge, off-ramp, MRWK price, private data, or secrets touched.

@caozhengming could you re-review when convenient?

Copy link
Copy Markdown

@xiefuzheng713-alt xiefuzheng713-alt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved current head d05a7ac45203a346c5ccbe15a969926e8f97e520.

Evidence checked:

  • inspected scripts/docs_smoke.py and tests/test_docs_public_urls.py on the current head;
  • confirmed the follow-up commit addresses the earlier duplicate-anchor blocker by counting repeated generated slugs and accepting the GitHub-style -1 suffix;
  • confirmed fenced Markdown/code blocks are skipped before heading extraction, so pseudo-headings inside triple-backtick and tilde fences are not accepted as real anchors;
  • confirmed _local_target_exists() keeps the existing file-existence behavior while only applying fragment checks to Markdown targets;
  • rechecked GitHub state before review: mergeStateStatus=CLEAN, hosted Quality/readiness/docs/image check is successful, and CodeRabbit is successful.

Validation on this exact head:

  • uv run --extra dev python -m pytest tests/test_docs_public_urls.py -q -> 39 passed.
  • uv run --extra dev python scripts/docs_smoke.py -> docs smoke ok.
  • uv run --extra dev ruff check scripts/docs_smoke.py tests/test_docs_public_urls.py -> passed.
  • uv run --extra dev ruff format --check scripts/docs_smoke.py tests/test_docs_public_urls.py -> 2 files already formatted.
  • uv run --extra dev mypy scripts/docs_smoke.py -> success.
  • git diff --check origin/main...HEAD -> clean.
  • git merge-tree --write-tree origin/main HEAD -> clean tree daf4215635eaadc757cc868712be348883a700ca.

No blocker found on the current head. Scope remains limited to docs-smoke Markdown anchor validation and focused tests; no ledger, wallet, treasury, payout, proposal execution, admin-token, exchange, bridge, cash-out, MRWK price, private data, or secrets behavior is changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants