Skip to content

Wave-RF/cloudflare-md-router

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cloudflare-md-router

A tiny Cloudflare Worker that serves the .md twin of any static page when the request is from a known LLM crawler or explicitly asks for text/markdown. Falls back to the HTML response when the .md twin doesn't exist.

If you're building a docs site that already emits a per-page raw-markdown twin (e.g. /foo/bar and /foo/bar.md), this lets every page do content negotiation transparently — Claude, ChatGPT, Perplexity, etc. fetch the model-friendly version automatically; humans keep getting the styled HTML page.

Behavior

Request Worker serves
Anything with a file extension (.css, .png, .md, …) Pass-through to ASSETS
Non-GET Pass-through to ASSETS
Accept: text/markdown <path>.md (HTML fallback on 404)
User-Agent matches a known LLM bot <path>.md (HTML fallback on 404)
Everything else Pass-through (HTML)

The included bot list covers the common ones: GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-Web, anthropic-ai, PerplexityBot, CCBot, Applebot-Extended, Google-Extended, cohere-ai, Bytespider, Diffbot. See src/bots.ts.

Install

pnpm add github:Wave-RF/cloudflare-md-router
# or pin to a tag / commit:
# pnpm add github:Wave-RF/cloudflare-md-router#v0.1.0

Use

The simplest setup — re-export the default handler from your worker entrypoint:

// worker/index.ts
export { default } from "cloudflare-md-router/worker";

Configure your wrangler.jsonc with an ASSETS binding pointing at your built static site:

{
  "name": "my-docs",
  "main": "worker/index.ts",
  "compatibility_date": "2025-01-01",
  "assets": {
    "directory": "./dist",
    "binding": "ASSETS",
    "not_found_handling": "404-page",
    "html_handling": "drop-trailing-slash",
    "run_worker_first": true
  }
}

run_worker_first is required so the worker sees the request before Cloudflare's static-asset matcher does — otherwise the worker only ever runs on 404s.

Customizing

Use createMdRouter() if you need to extend the bot list, change the .md path mapping, or add other Accept tokens:

// worker/index.ts
import { createMdRouter, LLM_BOT_UA } from "cloudflare-md-router";

export default createMdRouter({
  // Add your own bots:
  botUserAgents: new RegExp(LLM_BOT_UA.source + "|mybot", "i"),

  // Treat `Accept: text/x-markdown` as markdown too:
  acceptMarkdown: ["text/x-markdown"],

  // Custom .md path strategy. Default: `/foo/` → `/foo.md`, `/` → `/index.md`.
  mdPathFor: (pathname) => `/markdown${pathname.replace(/\/$/, "")}.md`,
});

Why content-negotiate?

Most LLMs do better with raw markdown than with rendered HTML — less DOM noise, no Starlight nav chrome, no script tags. Serving the same content at one URL with two representations means:

  • One canonical URL per page (good for citations and link-sharing).
  • Crawlers and human readers stay aligned automatically.
  • Your llms.txt can advertise <page>.md for explicit fetches; the worker covers the case where the LLM hits the HTML URL anyway.

License

MIT — see LICENSE.

About

Cloudflare Worker that routes .md-twin requests for static sites: serves the markdown version when an LLM crawler asks (User-Agent or Accept header), HTML otherwise.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors