AI Prompt Hardening & Improvements by GeorgeBPrice · Pull Request #8 · GeorgeBPrice/AI-MockRG

GeorgeBPrice · 2026-05-05T06:44:48Z

MR: AI prompt security hardening

Branch: Redevelopment--UI-overhual-+-Nextjs-update
Audit reference: Development Docs/Security/AI_PROMPT_SECURITY_AUDIT.md

Summary

Hardens both /api/generate (web UI) and /api/v1/generate (public API) against prompt-injection, jailbreak, off-topic abuse, SSRF, and quota-drain attacks identified in the audit. No user-facing functionality is removed for normal users; the only deliberate behavioural change is that additionalInstructions is now BYO-key only (and is no longer part of the public v1 contract).

This MR implements every Tier 0 and Tier 1 item from the audit, plus the high-impact items from Tier 2 (P2-3, P2-4) and Tier 3 (P3-2). Remaining items (P2-1, P2-2, P3-1, P3-3) are documented as deferred for future consideration.

What changed and why

Tier 0 — Infrastructure-level fixes

ID	Change	Why it matters
P0-1	`overrideBaseUrl` / `overrideHeaders` on `/api/generate` now require a caller-supplied `overrideApiKey`	Closes an SSRF / key-exfiltration primitive — previously a request could redirect outbound traffic with `process.env.OPENAI_API_KEY` attached
P0-2	Anonymous callers' rate-limit identifier is now `anon:<client-ip>` (from `x-forwarded-for` / `x-real-ip`)	Previously the entire internet shared one `'anonymous'` daily bucket — effectively a free LLM proxy until the bucket filled
P0-3	Full zod schema on `/api/generate` (matches v1 + `override*` extras)	Previously zero schema validation on the internal route
P0-4	v1 `maxTokens` ceiling: `100000 → 8000`	Removes the single most expensive abuse vector on the server's OpenAI bill

Tier 1 — Prompt hardening

ID	Change
P1-1	New hardened system prompt: declares the only allowed task, frames tagged content as untrusted data, instructs the model to emit `{"error":"off_topic"}` on anything off-task
P1-2	User-supplied fields are wrapped in `<SCHEMA>`/`<EXAMPLES>`/`<INSTRUCTIONS>` tags; reserved tag names are stripped from the input so a caller can't forge a closing tag
P1-3	Pre-flight regex denylist on `additionalInstructions` and `examples` — rejects jailbreak phrases (`ignore previous`, `disregard`, `pretend`, `developer mode`, `DAN`, `reveal your prompt`, …) and clearly off-topic verbs (`write a poem/essay/story`, `translate`, `summarise`, `recipe`, `give me advice`, `write python code`, …). `schema` is intentionally not shape-checked — informal `"name, age, dob, address"` remains valid
P1-4	Hard length caps in zod: `schema ≤ 8 KB`, `examples ≤ 4 KB`, `additionalInstructions ≤ 1 KB`
P1-5	Output gate: off-topic-sentinel detection returns 422 and does not charge against the daily quota; v1 SQL output containing `DROP`/`UPDATE`/`DELETE`/`ALTER`/`TRUNCATE`/`GRANT`/`REVOKE`/`CREATE USER`/`EXEC` is rejected
P1-6	Model response truncated to 200 KB before validation
P1-7	Sanitiser strips role-injection markers (`<
P1-8	Gemini calls now use the dedicated `system_instruction` field instead of fusing system+user into a single `parts.text`, restoring the role boundary

Tier 2 / Tier 3 — Selected items

ID	Change
P2-3	Per-identity in-flight concurrency cap (2 simultaneous generations). Redis-backed (`INCR`/`DECR` with 120 s safety TTL), fails open on Redis unavailability. Keyed by `userEmail` / `anon:<ip>` on `/api/generate` and `apikey:<id>` on `/api/v1/generate`. Returns 429 with `reason: "concurrency_limit"` when exceeded
P2-4	Tighter default `maxTokens`: `2000` when no `additionalInstructions`, `4000` otherwise. Public v1 default is fixed at `2000`
P3-2	`additionalInstructions` removed from the public v1 contract entirely (zod, request interface, `generateMockData` call). On `/api/generate` the field is gated to BYO-key callers — sending it without `overrideApiKey` returns `400 byo_key_required`. The Generator UI disables the textarea and shows a context-aware helper note when not BYO-key (different message for anonymous, no-key-saved, and toggle-off states)

Deferred (documented in the audit)

P2-1 Flag-logging of rejected prompts for audit/tuning.
P2-2 Per-API-key anomaly metrics with auto-disable.
P3-1 Optional second-pass LLM classifier.
P3-3 Provider safety headers (OpenAI user, Anthropic metadata.user_id).

Files added

File	Purpose
`lib/prompt-security.ts`	Shared security helpers: `buildSystemPrompt`, `sanitizeUserText`, `checkUserInput`, `isOffTopicSentinel`, `validateOutputShape`, `capOutput`, `OFF_TOPIC_SENTINEL`, `MAX_OUTPUT_BYTES`
`lib/concurrency-limit.ts`	Redis-backed per-identity in-flight cap: `acquireSlot` / `releaseSlot` / `MAX_INFLIGHT_PER_IDENTITY`
`Development Docs/AI_PROMPT_SECURITY_AUDIT.md`	The audit document this MR delivers against
`Development Docs/Security_Merge_Request.md`	This file

Files modified

File	Change
`lib/openai.ts`	Hardened system prompt, tagged & sanitised user fields, 200 KB output cap, Gemini `system_instruction`
`app/api/generate/route.ts`	zod validation; SSRF guard; BYO-key gate on `additionalInstructions`; per-IP rate-limit identifier; pre-flight intent check; off-topic sentinel handling; concurrency cap; tighter default `maxTokens`
`app/api/v1/generate/route.ts`	Tighter zod (`schema ≤ 8 KB`, `examples ≤ 4 KB`, `maxTokens ≤ 8000`); `additionalInstructions` removed from public contract; pre-flight intent check; off-topic sentinel handling; SQL dangerous-statement rejection; concurrency cap
`components/generator/config-panel.tsx`	Disables `Additional Instructions` textarea unless BYO-key is active; shows a context-aware helper note (anonymous / signed-in-no-key / toggle-off)

Behavioural / contract changes

/api/v1/generate — additionalInstructions is no longer an accepted field. Existing callers sending it will receive a 400 from zod (Unrecognized key if .strict() is later added; today it is silently ignored). Update Postman collection and docs/EXTERNAL_API.md accordingly.
/api/v1/generate — maxTokens ceiling lowered from 100000 to 8000; default lowered from 4000 to 2000. Callers requesting >8000 will receive a 400.
/api/generate — sending additionalInstructions without overrideApiKey now returns a 400 with reason: "byo_key_required". The web UI prevents this state automatically.
/api/generate — sending overrideBaseUrl or overrideHeaders without overrideApiKey now returns a 400.
Both routes — requests flagged by the pre-flight intent check return a 400 with field and reason fields. Off-topic model refusals return 422 and do not consume the daily quota.
Both routes — third concurrent in-flight request from the same identity returns a 429 with reason: "concurrency_limit".

Rollout / rollback notes

Rollout is purely additive at the infrastructure layer — no migrations, no env-var changes. The new files (lib/prompt-security.ts, lib/concurrency-limit.ts) have no external dependencies beyond the existing redis client and zod.
Concurrency cap fails open if Redis is unavailable, so a Redis outage will not break the generator (it will just lose the cap temporarily).
Rollback is a clean revert of the listed files — no irreversible state.
Follow-up work: update docs/EXTERNAL_API.md and the Postman collection to remove additionalInstructions from the v1 examples. Not blocking this MR.

Risk notes

Denylist false positives (P1-3) are bypassable but also bluntly worded. If a legitimate user trips one, they get a clear error naming the field and the offending pattern, so they can rephrase. The audit's deferred P2-1 (flag-logging) would let us tune these patterns from real traffic.
x-forwarded-for trust (P0-2) requires the platform to set the header. Vercel does. If self-hosted behind a proxy that doesn't strip inbound x-forwarded-for, callers can spoof their IP — revisit before any non-Vercel deploy.
Concurrency cap key choice uses userEmail for /api/generate (matches the existing daily-rate-limit identifier) and apikey:<id> for v1. This means a single user with both a session and an API key has two independent 2-slot pools — acceptable, since each pool is checked against its own quota.

vercel · 2026-05-05T06:44:53Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
ai-mocker	Ignored		May 5, 2026 7:52am

GeorgeBPrice · 2026-05-05T07:53:45Z

Hardens both /api/generate (web UI) and /api/v1/generate (public API) against prompt-injection, jailbreak, off-topic abuse, SSRF, and quota-drain attacks identified in the audit. No user-facing functionality is removed for normal users; the only deliberate behavioural change is that additionalInstructions is now BYO-key only (and is no longer part of the public v1 contract).

AI Prompt Hardening & Improvements

8b87e31

GeorgeBPrice added 2 commits May 5, 2026 17:18

Documentation & UI improvements

5c06390

Proxy Hardening - Refactored Tests

f47f31d

GeorgeBPrice merged commit 7acf05e into main May 5, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI Prompt Hardening & Improvements#8

AI Prompt Hardening & Improvements#8
GeorgeBPrice merged 3 commits into
mainfrom
AI-Prompt-Hardening-&-Improvements

GeorgeBPrice commented May 5, 2026

Uh oh!

vercel Bot commented May 5, 2026 •

edited

Loading

Uh oh!

GeorgeBPrice commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GeorgeBPrice commented May 5, 2026

MR: AI prompt security hardening

Summary

What changed and why

Tier 0 — Infrastructure-level fixes

Tier 1 — Prompt hardening

Tier 2 / Tier 3 — Selected items

Deferred (documented in the audit)

Files added

Files modified

Behavioural / contract changes

Rollout / rollback notes

Risk notes

Uh oh!

vercel Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GeorgeBPrice commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 5, 2026 •

edited

Loading