Restuta · Restuta · Mar 20, 2026 · Mar 22, 2026 · Copilot · Mar 22, 2026
diff --git a/docs/progress.md b/docs/progress.md
@@ -15,3 +15,6 @@
 - 2026-03-19: Deployed production successfully and verified the live domain with a real smoke test: claim -> publish -> HTML read -> raw read -> list -> delete on `https://bul.sh`.
 - 2026-03-19: Investigated true custom-domain external rewrites on Vercel. Redirects propagate to `bul.sh`, but rewrite routes did not behave as required on the custom domain.
 - 2026-03-19: Adopted the pragmatic Vercel production read path: serve pre-rendered HTML through Hono with aggressive edge-cache headers so subsequent reads are CDN hits while content remains stored in Blob.
+- 2026-03-20: Added a concrete cost-control and anti-abuse implementation plan to the project plan so hosted usage can be hardened later without redesigning the product.
+- 2026-03-20: Implemented first-pass hosted abuse controls in the service layer: reserved namespaces, markdown size caps, claim and publish rate limits, and lazy reclaim of empty stale namespaces.
+- 2026-03-20: Added automated coverage for the abuse controls through integration tests on the HTTP app.
diff --git a/docs/project-plan.md b/docs/project-plan.md
@@ -132,6 +132,100 @@ Local .pub mapping:
 - Revisions: `rev:{page_id}:v{n} → { blob_key, published_at }` + `current_version` field
 - Graduate to Postgres when KV queries become painful (listing, search, etc.)
 
+## Cost Model & Abuse Control
+
+The biggest risk on Vercel is not steady-state storage. It is abuse:
+- too many namespace claims
+- too many write operations
+- too many large pages
+- too many cache misses from spam content
+
+Storage itself should stay cheap for a long time. The app is mostly text, pages are small, and read traffic is edge-cached. The practical cost center to control is **writes and churn**, not simply page count.
+
+### Design Principle
+
+Keep the hosted version easy to use for legitimate humans and AI agents, but make abuse expensive or slow.
+
+### Phase 1 Controls (implement first)
+
+**1. Claim rate limiting**
+- Limit namespace claims per IP
+- Suggested starting point:
+  - 3 claims per hour per IP
+  - 10 claims per day per IP
+- Goal: stop namespace-squatting scripts and low-effort spam
+
+**2. Publish rate limiting**
+- Limit publishes by both IP and namespace
+- Suggested starting point:
+  - 30 publishes per 10 minutes per namespace
+  - 100 publishes per hour per IP
+- Goal: stop automated flooding while allowing normal iterative editing
+
+**3. Markdown size limits**
+- Hard cap on request body / markdown size
+- Suggested starting point:
+  - 256 KB per page for v1
+- Goal: prevent Blob from becoming arbitrary cheap object storage
+
+**4. Reserved namespaces**
+- Block obvious or sensitive names
+- Initial reserved set:
+  - `admin`
+  - `api`
+  - `www`
+  - `support`
+  - `help`
+  - `install`
+  - `bul`
+  - `pubmd`
+  - `root`
+- Goal: avoid confusion, collisions, and support burden
+
+**5. Empty-namespace reclaim policy**
+- If a namespace is claimed but no page is published within 7 days, reclaim it
+- Goal: reduce squatting without adding a full identity system
+
+### Phase 2 Controls (only if needed)
+
+**6. Token rotation**
+- Add `pubmd token rotate`
+- Invalidate old namespace token on rotation
+- Useful if a token leaks or a namespace is shared accidentally
+
+**7. Lightweight audit visibility**
+- Track:
+  - last claim time
+  - last publish time
+  - publish count over recent windows
+- Goal: make abuse visible before building a moderation dashboard
+
+**8. Optional friction for suspicious traffic**
+- Only if needed later:
+  - proof-of-work
+  - challenge pages
+  - manual review queue
+- Not a v1 priority
+
+### Implementation Notes
+
+- Enforcement should happen in the service layer, not just at the CDN edge
+- Limits should be configurable via environment variables
+- The hosted instance and self-hosted instances should be able to use different defaults
+- Abuse controls should fail with clear machine-readable errors so AI agents can recover gracefully
+
+### Metrics To Watch
+
+- namespaces claimed / day
+- namespaces reclaimed without publish
+- publishes / namespace / day
+- median markdown size
+- 95th percentile markdown size
+- cache hit ratio on page reads
+- total Blob writes vs. reads
+
+If those numbers stay low, keep the system simple. If they climb unnaturally, harden the hosted instance before scaling usage.
+
 ## Milestones
 
 ### M0: Spike (1 day)
@@ -163,6 +257,9 @@ Local .pub mapping:
 - [ ] Math/KaTeX + Mermaid rendering (add when requested)
 - [ ] Page versioning (keep history, show diffs) — data model already supports this
 - [ ] Page renames with redirects — data model already supports this
+- [x] Lightweight anti-abuse controls (claim/publish rate limits, reserved namespaces, max page size)
+- [x] Namespace reclaim policy for empty claims
+- [ ] Token rotation
 - [ ] View count analytics
 - [ ] Page collections with auto-generated index
 - [ ] Expiring pages (TTL)
@@ -190,7 +287,7 @@ Local .pub mapping:
 ## Things to Decide
 
 - [ ] **Name**: `pub`? `md.pub`? `mdpost`? `pushmd`? Need a good domain.
-- [ ] **Free tier limits**: unlimited pages? Rate limit only? Storage cap?
+- [ ] **Hosted free tier**: what claim/publish/size limits are acceptable before introducing stronger friction?
 - [ ] **Subdomain vs path**: `namespace.domain` vs `domain/namespace` — start with path, add subdomain later?
 - [ ] **Markdown flavor**: strict GFM or also support Obsidian-flavored ([[wikilinks]], ==highlights==,  callouts)?
 - [ ] **Default visibility**: unlisted (noindex) or public?

diff --git a/src/core/blob-store.ts b/src/core/blob-store.ts
@@ -1,4 +1,4 @@
-import { del, get, put } from "@vercel/blob";
+import { del, get, list, put } from "@vercel/blob";
 import { z } from "zod";
 
 import {
@@ -11,14 +11,16 @@ import {
   type FilePayload,
   NamespaceNotFoundError,
   type PublishRepository,
+  type RateLimitRecord,
 } from "./repository.js";
 
 const LookupRecordSchema = z.object({
   pageId: z.string().uuid(),
 });
 
-const NamespacePageIndexSchema = z.object({
-  pages: z.array(StoredPageSchema),
+const RateLimitRecordSchema = z.object({
+  count: z.number(),
+  windowStartedAt: z.string(),
 });
 
 export function createBlobStore(
@@ -29,13 +31,25 @@ export function createBlobStore(
     namespace: string,
     tokenHash: string,
   ): Promise<void> {
-    const record: NamespaceRecord = {
-      namespace,
-      tokenHash,
-      createdAt: new Date().toISOString(),
-    };
+    await saveNamespace(
+      {
+        namespace,
+        tokenHash,
+        createdAt: new Date().toISOString(),
+      },
+      false,
+    );
+  }
 
-    await writeJsonBlob(namespacePath(namespace), record, false);
+  async function saveNamespace(
+    record: NamespaceRecord,
+    allowOverwrite = true,
+  ): Promise<void> {
+    await writeJsonBlob(
+      namespacePath(record.namespace),
+      record,
+      allowOverwrite,
+    );
   }
 
   async function getNamespace(
@@ -54,22 +68,50 @@ export function createBlobStore(
       throw new NamespaceNotFoundError(namespace);
     }
 
-    await writeJsonBlob(namespacePath(namespace), {
+    await saveNamespace({
       ...current,
       lastPublishAt,
     });
   }
 
+  async function getRateLimitRecord(
+    bucket: string,
+  ): Promise<RateLimitRecord | null> {
+    return readJsonBlob(rateLimitPath(bucket), RateLimitRecordSchema);
+  }
+
+  async function setRateLimitRecord(
+    bucket: string,
+    record: RateLimitRecord,
+  ): Promise<void> {
+    await writeJsonBlob(rateLimitPath(bucket), record);
+  }
+
   async function listPages(namespace: string): Promise<StoredPage[]> {
-    const index = await readJsonBlob(
-      namespaceIndexPath(namespace),
-      NamespacePageIndexSchema,
-    );
-    const pages = index?.pages ?? [];
+    const lookupResults = await list({
+      limit: 1000,
+      prefix: `${lookupPrefix(namespace)}/`,
+      token: metadataToken,
+    });
 
-    return pages.sort((left, right) =>
-      right.updatedAt.localeCompare(left.updatedAt),
+    const pages = await Promise.all(
+      lookupResults.blobs.map(async (lookupBlob) => {
+        const lookup = await readJsonBlob(
+          lookupBlob.pathname,
+          LookupRecordSchema,
+        );
+
+        if (lookup === null) {
+          return null;
+        }
+
+        return findPageById(lookup.pageId);
+      }),
     );
+
+    return pages
+      .filter((page): page is StoredPage => page !== null)
+      .sort((left, right) => right.updatedAt.localeCompare(left.updatedAt));
-    return pages
-      .filter((page): page is StoredPage => page !== null)
-      .sort((left, right) => right.updatedAt.localeCompare(left.updatedAt));
+    const nonNullPages = pages.filter(
+      (page): page is StoredPage => page !== null,
+    );
+
+    const uniquePagesById = new Map<string, StoredPage>();
+
+    for (const page of nonNullPages) {
+      // Deduplicate by page identifier to avoid returning the same page multiple times
+      if (!uniquePagesById.has(page.id)) {
+        uniquePagesById.set(page.id, page);
+      }
+    }
+
+    return Array.from(uniquePagesById.values()).sort((left, right) =>
+      right.updatedAt.localeCompare(left.updatedAt),
+    );
-    return pages
-      .filter((page): page is StoredPage => page !== null)
-      .sort((left, right) => right.updatedAt.localeCompare(left.updatedAt));
+    const nonNullPages = pages.filter(
+      (page): page is StoredPage => page !== null,
+    );
+
+    const uniquePagesById = new Map<string, StoredPage>();
+
+    for (const page of nonNullPages) {
+      // Deduplicate by page identifier to avoid returning the same page multiple times
+      if (!uniquePagesById.has(page.id)) {
+        uniquePagesById.set(page.id, page);
+      }
+    }
+
+    return Array.from(uniquePagesById.values()).sort((left, right) =>
+      right.updatedAt.localeCompare(left.updatedAt),
+    );
   }
 
   async function findPageById(pageId: string): Promise<StoredPage | null> {
@@ -106,7 +148,6 @@ export function createBlobStore(
       writeJsonBlob(lookupPath(page.namespace, page.slug), {
         pageId: page.pageId,
       }),
-      writeNamespaceIndex(page.namespace, page),
     ]);
 
     if (previousPage !== null && previousPage.slug !== page.slug) {
@@ -122,7 +163,6 @@ export function createBlobStore(
         token: metadataToken,
       }),
       del([page.markdownBlobKey, page.htmlBlobKey], { token: contentToken }),
-      removeFromNamespaceIndex(page.namespace, page.pageId),
     ]);
   }
 
@@ -192,58 +232,20 @@ export function createBlobStore(
     return `namespaces/${namespace}.json`;
   }
 
-  function namespaceIndexPath(namespace: string): string {
-    return `indexes/${namespace}.json`;
-  }
-
   function pagePath(pageId: string): string {
     return `pages/${pageId}.json`;
   }
 
-  function lookupPath(namespace: string, slug: string): string {
-    return `lookups/${namespace}/${slug}.json`;
+  function lookupPrefix(namespace: string): string {
+    return `lookups/${namespace}`;
   }
 
-  async function writeNamespaceIndex(
-    namespace: string,
-    page: StoredPage,
-  ): Promise<void> {
-    const current = await readJsonBlob(
-      namespaceIndexPath(namespace),
-      NamespacePageIndexSchema,
-    );
-    const nextPages = [...(current?.pages ?? [])];
-    const existingIndex = nextPages.findIndex(
-      (currentPage) => currentPage.pageId === page.pageId,
-    );
-
-    if (existingIndex === -1) {
-      nextPages.push(page);
-    } else {
-      nextPages[existingIndex] = page;
-    }
-
-    await writeJsonBlob(namespaceIndexPath(namespace), {
-      pages: nextPages,
-    });
+  function lookupPath(namespace: string, slug: string): string {
+    return `lookups/${namespace}/${slug}.json`;
   }
 
-  async function removeFromNamespaceIndex(
-    namespace: string,
-    pageId: string,
-  ): Promise<void> {
-    const current = await readJsonBlob(
-      namespaceIndexPath(namespace),
-      NamespacePageIndexSchema,
-    );
-
-    if (current === null) {
-      return;
-    }
-
-    await writeJsonBlob(namespaceIndexPath(namespace), {
-      pages: current.pages.filter((page) => page.pageId !== pageId),
-    });
+  function rateLimitPath(bucket: string): string {
+    return `rate-limits/${sanitizeBucket(bucket)}.json`;
   }
 
   return {
@@ -252,10 +254,13 @@ export function createBlobStore(
     findPageById,
     findPageBySlug,
     getNamespace,
+    getRateLimitRecord,
     listPages,
     readHtml,
     readMarkdown,
+    saveNamespace,
     savePage,
+    setRateLimitRecord,
     touchNamespace,
   };
 }
@@ -269,3 +274,7 @@ async function streamToString(
 function stringifyJson(value: unknown): string {
   return `${JSON.stringify(value, null, 2)}\n`;
 }
+
+function sanitizeBucket(bucket: string): string {
+  return bucket.replaceAll(/[^a-zA-Z0-9/_-]+/g, "_");
+}