The original core build plan is implemented and compiling with zero warnings. The project has also moved past that baseline: push, clone, fetch, the read API, diff engine, FTS5 search, issues/PRs, agent-readable markdown/plain page views, the optional GitHub OAuth auth worker, and the new root test harness all work.
Tested with:
- 235-commit repo — 5.3x compression ratio via xpatch
- cloudflare/agents — 13,464 objects, 11.4 MiB pack, pushes in one shot
- git/git — 80K commits, pushed incrementally to fp 14,000 (checkpoint pushes)
src/
lib.rs — Worker entry point, routing, owner profile page, content
negotiation dispatch, ref advertisement, admin endpoints
presentation.rs — `?format` + `Accept` negotiation, markdown/plain helpers,
action rendering, shared text-mode hints, `Vary: Accept`
schema.rs — 11 tables + 3 FTS5 virtual tables + indexes
(refs, commits, commit_parents, commit_graph, trees,
blob_groups, blobs, blob_chunks, raw_objects, config,
fts_head, fts_commits)
pack.rs — Streaming pack parser: build_index (decompress-to-sink),
resolve_type (OFS_DELTA chain following), resolve_entry
(on-demand decompression, Arc-based ResolveCache with byte
budget, ResolveCtx bundle), pack generator for fetch.
MAX_PACK_BYTES (50 MB) and CACHE_BUDGET_BYTES (20 MB).
store.rs — Storage layer: commit/tree/blob parsing, xpatch delta
compression with zlib-compressed keyframes + blob_chunks,
batched SQL INSERTs, binary lifting commit graph,
blob reconstruction, config helpers, incremental FTS
rebuild, search (FTS5 + INSTR), lossy UTF-8
git.rs — Git smart HTTP protocol: receive-pack (pack body size
gate, streaming pack processing, two-phase push handling,
dynamic default branch, FTS trigger), upload-pack (fetch)
api.rs — Read API: refs, log, commit, tree, blob, file-at-ref,
search (code + commits, @prefix: column filter syntax),
stats (using stored_size column, no full table scan)
diff.rs — Diff engine: recursive tree comparison, line-level diffs
(via `similar`), commit diff, two-commit compare
issues.rs — Issues/PR storage, comments, merge-base search, three-way
merge, merge commit creation, form parsing utilities
web.rs — Shared HTML shell/CSS, markdown rendering, owner profile,
repo README helpers, diff rendering, raw/blob helpers
web/ — Repo home, log, tree/blob, search, settings, commit/diff
HTML + markdown renderers
issues_web.rs — Shared issues/PR web helpers and re-exports
issues_web/ — Issues/pulls list, detail, and form HTML + markdown
renderers
examples/github-oauth/
src/index.ts — GitHub OAuth front worker, browser sessions, agent tokens,
trusted header forwarding, text-mode landing/settings
README.md — Setup, deploy, bindings/secrets, and text-mode docs
tests/
helpers/mf.mjs — Miniflare test server factory for the core worker
helpers/git.mjs — temp repo + git CLI helpers for fixture-based e2e tests
worker-smoke.spec.mjs
— negotiated representation and auth smoke tests
git-e2e.spec.mjs
— real-world push/clone/fetch/force-push coverage
fixtures/ — pinned offline git fixture bundles and refresh notes
All items below have been implemented and verified.
-
All matches with line numbers — After FTS5 identifies matching files, scans content line-by-line and returns every match with its line number. Web UI links to
/blob/:ref/:path#L47. -
Literal / exact substring search — Auto-detects symbol-heavy queries (
.,_,(),::) and falls back toINSTR(content, ?). Full table scan but bounded by repo size.lit:prefix also forces literal mode. -
Scope filters / @prefix: syntax —
@path:src/,@ext:rs,@author:,@message:,@content:inline query prefixes replace separate form fields. Parsed inapi::parse_search_query, strips@and maps to FTS5 column filters or SQL LIKE predicates. Auto-routes scope (code vs commits) from the prefix used. Works for both FTS5 and INSTR modes. -
Commit message search —
fts_commitsFTS5 table indexed on hash, message, author. Populated during push. Exposed via?scope=commitsand a tab in the web UI search page. -
Incremental FTS rebuild — Uses
diff::diff_treesto compare old and new HEAD trees. Only inserts/deletes/updates changed files infts_head. Stores last indexed commit hash inconfigtable. O(changed files) per push. -
Default branch detection — First branch pushed becomes the default. Stored in
configtable. FTS rebuild triggers on any push to the default branch (not hardcoded tomain).
-
Branch selector — Dropdown on home, tree, blob, and log pages listing all
refs/heads/*entries. Shows current branch prominently. Selecting a branch navigates to the same page on that branch. -
Agent-readable representations — Page routes negotiate
text/html,text/markdown, andtext/plain.?format=overridesAccept. Responses chosen fromAcceptaddVary: Accept. -
Markdown/plain page coverage — Owner profile, repo home, commits, tree, blob, commit, diff, search, settings, issues, pulls, issue/PR detail, and new issue/new PR forms all have explicit text renderers.
-
Shared presentation layer —
src/presentation.rscentralizes negotiation, markdown/plain response helpers, action descriptions, section rendering, and the shared navigation hint for agents. -
Web module split — Repo pages live in
src/web/*; issue/PR pages live insrc/issues_web/*.src/web.rsandsrc/issues_web.rsstay as shared shells/helpers. -
Issues and pull requests — SQLite-backed issues/PRs with list/detail pages, new issue/new PR forms, comments, open/close/reopen actions, and repo-owner merge.
-
PR merge flow — Merge-base search plus fast-forward or three-way tree merge inside the DO. Stores the merge commit and updates the target ref.
-
Markdown rendering — Replaced the hand-rolled renderer with
pulldown-cmark. Supports tables, footnotes, strikethrough, task lists, and smart punctuation. Raw HTML is escaped and unsafe URLs are neutralized. -
Repo-aware README links — Relative README links and images on the repo home page are rewritten against the current ref so in-repo navigation works (
/blob,/tree,/raw). -
Syntax highlighting — highlight.js CDN with line numbers plugin.
#Lanchor support for deep linking to specific lines. -
Persistent nav search with live results — Search bar in the nav on every page. Fetches
/search?q=...on each keystroke (200ms debounce), shows a dropdown of file paths + first matching line (code) or commit hash- message (commits). Enter navigates to the full search page. Scope
(code vs commits) detected client-side from
@author:/@message:prefixes.
- message (commits). Enter navigates to the full search page. Scope
(code vs commits) detected client-side from
-
Repo bar layout fix — Global nav stays full width while the repo secondary bar uses a full-width wrapper with centered inner contents.
-
HEAD symbolic ref —
advertise_refsincludesHEADpointing to the default branch viasymref=HEAD:refs/heads/:namecapability. Fixesgit clonefor repos whose default branch isn'tmain. -
Clone with non-main branch — Consequence of #19.
git clonenow checks out the correct branch automatically. -
Two-phase push handling — When
git pushsends a payload larger thanhttp.postBuffer(1 MiB default), git sends a 4-byte probe (0000) then the full payload with chunked encoding. Fixed by returning 200 OK for empty command sets.
-
Streaming pack parser — Replaced the all-in-memory
parse()with a two-pass approach: index pass (decompress-to-sink, ~100 bytes/entry) then process-by-type (decompress on-demand from pack bytes). Peak memory went from >128 MiB (OOM) to ~15 MiB for a 13K-object pack. -
Resolve cache — Bounded 1024-entry cache for resolved pack entries. Caches delta chain bases and intermediates to avoid re-decompressing shared bases. Critical for git packs with depth-50 chains — reduces decompressions by 5-10x for packs with many objects sharing base chains.
-
Keyframe compression — Keyframes (full blob snapshots, every 50 versions) are zlib-compressed before storage. A 5 MB source file compresses to ~500 KB. Deltas are left as-is (xpatch uses zstd internally). Zero cost for the common case (all files fit in single rows after compression).
-
Blob chunking —
blob_chunksoverflow table for compressed keyframes that still exceed DO's 2 MB row limit. Transparent to all read paths —reconstruct_blobreassembles chunks automatically. Only activates for large binary files. -
Batched SQL INSERTs — Tree entries batched 25 per statement (4 params each, under DO's 100 bound parameter limit). Commit parents batched 33 per statement. Cuts total SQL operations by ~6x for large pushes.
-
Fast existence checks — Replaced
SELECT COUNT(*) AS nwithSELECT 1 LIMIT 1for dedup checks in store_commit, store_tree, store_blob. Indexed PK lookup, instant return. -
stored_size column — Tracks compressed blob size at INSERT time. Stats endpoint uses
SUM(stored_size)over an integer column instead ofSUM(LENGTH(data))which would scan every data page. Instant stats regardless of repo size. -
Lossy UTF-8 —
String::from_utf8_lossyfor commit parsing. Old repos with Latin-1 or other non-UTF-8 author names are handled gracefully. Raw bytes preserved inraw_objectsfor byte-identical fetch. -
Admin endpoints —
DELETE /repo/:name/wipes all tables.PUT /repo/:name/admin/set-refmanually sets a ref for recovery from partial push timeouts. -
Arc-based zero-copy resolve cache —
ResolveCachestoresArc<[u8]>instead ofVec<u8>. Cache hits returnArc::clone(pointer increment, no data copy).ExternalObjectsalso usesArc<[u8]>.resolve_entryreturnsArc<[u8]>— each decompressed object is allocated exactly once and shared between the cache and the caller. During a processing loop, the Arc is at refcount 2 (cache + caller); caller drops at end of iteration, leaving refcount 1 in cache.cache.clear()drops the last reference. -
Budget enforcement —
MAX_PACK_BYTES = 50 MBhard gate inhandle_receive_pack: packs above this return a properngpkt-line response before any object is parsed.CACHE_BUDGET_BYTES = 20 MBenforced insideResolveCache::try_cache— cache silently stops growing when the byte budget is exhausted; processing continues via re-decompression. Peak memory ceiling at a 50 MB push: ~85 MB (40 MB below the 128 MB wall).ResolveCtxbundles cache + external objects forresolve_entry.
-
Auth worker text mode —
examples/github-oauthlanding page and/settingsalso negotiate markdown/plain views for curl/agents. -
Docs refresh —
README.mddocuments text-mode navigation and curl examples.examples/github-oauth/README.mdcovers setup, deploy, bindings/secrets, and text-mode behavior.
-
Root test harness — Added a root
package.json+vitest/miniflaresetup for the core ripgit worker only. Tests boot the built worker with the real KV + SQLite DO bindings under Miniflare. -
Rust protocol/unit coverage — Added unit tests for representation negotiation, search query parsing, URL query decoding, and upload-pack negotiation helpers.
-
Real-world git fixture e2e — Added a pinned offline
tests/fixtures/workers-rs-main.bundlefixture and a git CLI e2e suite covering push, clone, fast-forward push, force-push, search refresh, and fetch from an existing clone. -
Fetch after force-push — Fixed
git fetchfor existing clones after a non-fast-forward rewrite by advertising upload-pack capabilities separately from receive-pack and by implementing the expected ACK/NAK negotiation before streaming the pack.
These are documented, accepted trade-offs — not bugs.
- Auth is upstream — ripgit expects trusted
X-Ripgit-Actor-*headers from an auth worker or other front proxy. Reads are public; writes return 401 without a trusted actor. - DO storage timeout — Pushes with many objects (>~10K per incremental
push) can exceed the DO's ~30 second storage operation timeout. Each
sql.exec()auto-commits individually (no request-level transaction). Cloudflare'stransactionSync()API would provide atomicity but is not exposed in workers-rs 0.7.5. Use the admin/set-ref endpoint to recover from partial push state. - 50 MB pack body limit (server-enforced) —
MAX_PACK_BYTESinpack.rsrejects packs above 50 MB with a cleanngresponse before any object is parsed. The hard Workers platform limit is 100 MB, but we gate lower to keep peak DO memory well under the 128 MB ceiling. Repos must be pushed incrementally via the push script's checkpoint mechanism. - Force pushes are always allowed — non-fast-forward updates currently work, but there is no repo setting or policy hook to reject them when a repo wants branch protection semantics.
- No annotated tag objects — Silently dropped during push. Lightweight tags (refs) work fine.
- Timezone lost in parsed commits table — The
commitstable stores unix timestamps without timezone offset. Theraw_objectstable preserves the original bytes for fetch, but the/logAPI shows times without timezone. - README fragment-only heading links — Relative README file/dir/image links
are rewritten on the repo home page, but bare
#headingfragments are not yet translated to GitHub-style generated heading IDs. - side-band-64k — Removed from advertised capabilities to avoid wrapping the report-status response with sideband bytes.
Show all commits that touched a specific file. Walk the first-parent commit chain, resolve the file path in each commit's tree, emit a result when the blob hash changes. O(commits * path_depth) — fast for DO-sized repos.
- API:
GET /history?ref=main&path=src/lib.rs— returns list of commits that modified the file, with timestamps, authors, and messages. - Web UI: linked from the blob viewer. Paginated commit list scoped to one file.
Attribute each line of a file to the commit that last modified it. Leverages blob_groups (all versions of a file by path) and the diff engine (line-level diffs).
- API:
GET /blame?ref=main&path=src/lib.rs— returns lines with commit hash, author, timestamp per line. - Web UI: blame view linked from the blob viewer. Line numbers, commit info column, file content.
Browse lightweight tags in the web UI. Already stored in the refs table as
refs/tags/*. Quick win — just a new page listing tags with their target
commit info.
- Web UI:
/repo/:name/tags— list of tags with commit hash, author, date, and message. Link to commit detail page.
- transactionSync binding — Add a custom wasm_bindgen binding for
ctx.storage.transactionSync()to get atomic push semantics. Prevents partial state on DO timeout. The JS API exists, workers-rs just doesn't expose it yet. - Repository index — KV side-index written on push, landing page listing
all repos with stats. Needs a KV binding in
wrangler.toml. - Annotated tags — Parse and store tag objects (separate from lightweight
tag refs). Requires a
tag_objectstable + pack parser changes. - Alternative auth frontends — Bearer token-only or Cloudflare Access integration beyond the GitHub OAuth example.
- Force-push policy — Add optional rejection or branch-protection rules for non-fast-forward updates instead of always allowing them.
- Streaming zlib compression — Currently
blob_zlib_compressbuffers the entire compressed output (2x blob size in memory). Switching toflate2::write::ZlibEncoderwith incremental chunk writes would eliminate the compressed copy, reducing peak memory fromraw + compressedtoraw + ~256 KB. Compression ratio is identical (single continuous zlib stream). Main blocker: interleaves compression withblob_chunksINSERT logic, changing thestore_blobflow. Worth doing for medium-sized blobs (10-50 MB) where the compressed copy is significant. - side-band-64k — Re-add with proper sideband wrapping for progress reporting.
- Selective page JSON — Consider page-model JSON for page-only routes such as owner profile, repo home, settings, and auth worker pages if agents need it; keep the resource JSON API canonical.
- dont use fetch from DO. expose rpc methods, let worker call the right one.