Skip to content

feat(slugs): normalize accented slugs and locale-prefixed link resolution#170

Open
luandro wants to merge 15 commits intomainfrom
issue-164
Open

feat(slugs): normalize accented slugs and locale-prefixed link resolution#170
luandro wants to merge 15 commits intomainfrom
issue-164

Conversation

@luandro
Copy link
Contributor

@luandro luandro commented Mar 19, 2026

Summary

Closes #164

  • ASCII-safe slugs: createSafeSlug() uses NFD decomposition to strip diacritics (á→a, é→e, ñ→n, ã→a, ç→c) from filenames, heading IDs, and link fragments
  • Locale-prefixed links: normalizeInternalDocLinks() rewrites /docs/Guía Rápida to /es/docs/guia-rapida (or /pt/, etc.) based on the content's locale
  • Stable heading anchors: injectExplicitHeadingIds() appends {#id} markers to headings before writing, with -1/-2 deduplication, skipping fenced code blocks
  • Consolidates three previously duplicated inline slugify implementations into the shared createSafeSlug

Test plan

  • bunx vitest run scripts/notion-fetch/slugUtils.test.ts — 12 cases covering Latin, Portuguese, Spanish, CJK edge cases
  • bunx vitest run scripts/notion-fetch/linkNormalizer.test.ts — 10 cases covering locale prefixing, fragments, external/relative/image link exclusion
  • bunx vitest run scripts/notion-fetch/contentSanitizer.test.ts — includes 2 new heading ID injection tests
  • bunx vitest run scripts/notion-fetch/generateBlocks.test.ts — includes 3 new integration tests for slug filenames, link normalization, and heading injection pipeline

…solution

Closes #164

- Add createSafeSlug() using NFD decomposition to strip diacritics
  (á→a, é→e, ñ→n, ã→a, ç→c, etc.) from filenames and anchor IDs
- Add normalizeInternalDocLinks() to rewrite /docs/ links with the
  correct locale prefix (/es/docs/..., /pt/docs/...) and slugify
  path segments and fragments
- Add injectExplicitHeadingIds() to append stable {#id} anchors to
  headings, deduplicated with -1/-2 suffixes, skipping code fences
- Replace three inline slugify implementations with createSafeSlug
- Fix code fence regex to be line-anchored (prevented heading ID
  injection inside fenced blocks)
- Wrap decodeURIComponent with safeDecode to avoid URIError on
  percent signs in page titles (e.g. "100% complete")
- Add unit tests for slugUtils (12 cases) and linkNormalizer (10 cases)
greptile-apps[bot]

This comment was marked as off-topic.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 19, 2026

🐳 Docker Image Published

Your Docker image has been built and pushed for this PR.

Image Reference: docker.io/communityfirst/comapeo-docs-api:pr-170

Platforms: linux/amd64, linux/arm64

Testing

To test this image:

docker pull docker.io/communityfirst/comapeo-docs-api:pr-170
docker run -p 3001:3001 docker.io/communityfirst/comapeo-docs-api:pr-170

Built with commit f585ecf

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d6df31b944

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@github-actions
Copy link
Contributor

github-actions bot commented Mar 19, 2026

🚀 Preview Deployment

Your documentation preview is ready!

Preview URL: https://pr-170.comapeo-docs.pages.dev

🔄 Content: Regenerated 5 pages from Notion (script changes detected)

💡 Tip: Add label fetch-all-pages to test with full content, or fetch-10-pages for broader coverage.

This preview will update automatically when you push new commits to this PR.


Built with commit f585ecf

@luandro
Copy link
Contributor Author

luandro commented Mar 19, 2026

@codex review

@kilo-code-bot

This comment was marked as outdated.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d6df31b944

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- normalizeDocPathname now uses only the last path segment, matching
  the flat slug shape that buildFrontmatter() generates (slug: /${safeSlug}).
  Multi-segment paths like /docs/Category/Page previously resolved to
  /docs/category/page which does not exist, causing 404s.
- normalizeInternalDocLinks now masks fenced code blocks and inline
  code before rewriting links, so Markdown link examples inside code
  fences are no longer altered.
- Update test for nested path to expect flat slug output.
- Add tests for code-fence and inline-code protection.
greptile-apps[bot]

This comment was marked as off-topic.

@luandro
Copy link
Contributor Author

luandro commented Mar 19, 2026

@codex review

- normalizeDocPathname now uses only the last path segment, matching
  the flat slug shape buildFrontmatter() generates (slug: /${safeSlug}).
  Multi-segment paths like /docs/Category/Page previously resolved to
  /docs/category/page which does not exist, causing 404s.
- normalizeInternalDocLinks now masks fenced code blocks and inline
  code before rewriting links, so Markdown link examples inside code
  fences are no longer altered.
- Refactor mask/restore logic into dedicated maskCode/restoreCode helpers.
- Update test for nested path to expect flat slug output.
- Add tests for code-fence and inline-code protection.
greptile-apps[bot]

This comment was marked as off-topic.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 37a372e611

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…eading injection

Fix code-fence masking regex in linkNormalizer and contentSanitizer to
allow leading whitespace (^[ \t]*```) so indented fences (e.g. inside
list items or admonitions) are also protected before link normalization
and heading ID injection.
Add test for indented code fence protection.
greptile-apps[bot]

This comment was marked as off-topic.

@luandro
Copy link
Contributor Author

luandro commented Mar 19, 2026

@codex review

@chatgpt-codex-connector

This comment was marked as off-topic.

CoMapeo Content Bot and others added 5 commits March 19, 2026 18:37
Replace ASCII-only regex with Unicode property escapes (\p{L}\p{N})
so CJK and accented characters are retained in slugs instead of stripped.
Update tests to reflect corrected behavior. Extend ESLint config to cover bun-tests/.
…ding counter for explicit IDs

- Replace `[ \t]*` with ` {0,3}` in code-fence masks across contentSanitizer and linkNormalizer, matching CommonMark's 0–3 space rule for fenced blocks
- Register the text-derived baseId in headingCounts when a heading already carries an explicit {#id}, preventing incorrect -0 suffixes on subsequent duplicate headings
- Suppress security/detect-non-literal-fs-filename ESLint warnings in verifyExportCoverage where the path parameter is already validated
…al slugs

Heading ID generator now skips IDs already claimed by explicit headings
or naturally-occurring slugs, preventing duplicate anchors.
Co-authored-by: Junie <junie@jetbrains.com>
luandro added 2 commits March 26, 2026 07:59
…inkNormalizer

- Replace consuming (^|[^!]) group with (?<![!]) lookbehind so adjacent
  markdown links are both matched instead of the second being skipped
- Extend /docs guard to also pass through bare /docs and /docs#fragment
  targets, which normalizeDocPathname() already handled correctly
- Add regression tests for adjacent links and exact /docs paths
…ility

Both contentSanitizer and linkNormalizer implemented nearly identical
logic for masking and restoring fenced code blocks and inline code spans.
Extract into a shared markdownUtils.ts to eliminate duplication and
prevent future drift.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Links: Multilanguage slugs

1 participant