Skip to content

AI Prompt Hardening & Improvements#8

Merged
GeorgeBPrice merged 3 commits into
mainfrom
AI-Prompt-Hardening-&-Improvements
May 5, 2026
Merged

AI Prompt Hardening & Improvements#8
GeorgeBPrice merged 3 commits into
mainfrom
AI-Prompt-Hardening-&-Improvements

Conversation

@GeorgeBPrice
Copy link
Copy Markdown
Owner

MR: AI prompt security hardening

Branch: Redevelopment--UI-overhual-+-Nextjs-update
Audit reference: Development Docs/Security/AI_PROMPT_SECURITY_AUDIT.md


Summary

Hardens both /api/generate (web UI) and /api/v1/generate (public API) against prompt-injection, jailbreak, off-topic abuse, SSRF, and quota-drain attacks identified in the audit. No user-facing functionality is removed for normal users; the only deliberate behavioural change is that additionalInstructions is now BYO-key only (and is no longer part of the public v1 contract).

This MR implements every Tier 0 and Tier 1 item from the audit, plus the high-impact items from Tier 2 (P2-3, P2-4) and Tier 3 (P3-2). Remaining items (P2-1, P2-2, P3-1, P3-3) are documented as deferred for future consideration.


What changed and why

Tier 0 — Infrastructure-level fixes

ID Change Why it matters
P0-1 overrideBaseUrl / overrideHeaders on /api/generate now require a caller-supplied overrideApiKey Closes an SSRF / key-exfiltration primitive — previously a request could redirect outbound traffic with process.env.OPENAI_API_KEY attached
P0-2 Anonymous callers' rate-limit identifier is now anon:<client-ip> (from x-forwarded-for / x-real-ip) Previously the entire internet shared one 'anonymous' daily bucket — effectively a free LLM proxy until the bucket filled
P0-3 Full zod schema on /api/generate (matches v1 + override* extras) Previously zero schema validation on the internal route
P0-4 v1 maxTokens ceiling: 100000 → 8000 Removes the single most expensive abuse vector on the server's OpenAI bill

Tier 1 — Prompt hardening

ID Change
P1-1 New hardened system prompt: declares the only allowed task, frames tagged content as untrusted data, instructs the model to emit {"error":"off_topic"} on anything off-task
P1-2 User-supplied fields are wrapped in <SCHEMA>/<EXAMPLES>/<INSTRUCTIONS> tags; reserved tag names are stripped from the input so a caller can't forge a closing tag
P1-3 Pre-flight regex denylist on additionalInstructions and examples — rejects jailbreak phrases (ignore previous, disregard, pretend, developer mode, DAN, reveal your prompt, …) and clearly off-topic verbs (write a poem/essay/story, translate, summarise, recipe, give me advice, write python code, …). schema is intentionally not shape-checked — informal "name, age, dob, address" remains valid
P1-4 Hard length caps in zod: schema ≤ 8 KB, examples ≤ 4 KB, additionalInstructions ≤ 1 KB
P1-5 Output gate: off-topic-sentinel detection returns 422 and does not charge against the daily quota; v1 SQL output containing DROP/UPDATE/DELETE/ALTER/TRUNCATE/GRANT/REVOKE/CREATE USER/EXEC is rejected
P1-6 Model response truncated to 200 KB before validation
P1-7 Sanitiser strips role-injection markers (`<
P1-8 Gemini calls now use the dedicated system_instruction field instead of fusing system+user into a single parts.text, restoring the role boundary

Tier 2 / Tier 3 — Selected items

ID Change
P2-3 Per-identity in-flight concurrency cap (2 simultaneous generations). Redis-backed (INCR/DECR with 120 s safety TTL), fails open on Redis unavailability. Keyed by userEmail / anon:<ip> on /api/generate and apikey:<id> on /api/v1/generate. Returns 429 with reason: "concurrency_limit" when exceeded
P2-4 Tighter default maxTokens: 2000 when no additionalInstructions, 4000 otherwise. Public v1 default is fixed at 2000
P3-2 additionalInstructions removed from the public v1 contract entirely (zod, request interface, generateMockData call). On /api/generate the field is gated to BYO-key callers — sending it without overrideApiKey returns 400 byo_key_required. The Generator UI disables the textarea and shows a context-aware helper note when not BYO-key (different message for anonymous, no-key-saved, and toggle-off states)

Deferred (documented in the audit)

  • P2-1 Flag-logging of rejected prompts for audit/tuning.
  • P2-2 Per-API-key anomaly metrics with auto-disable.
  • P3-1 Optional second-pass LLM classifier.
  • P3-3 Provider safety headers (OpenAI user, Anthropic metadata.user_id).

Files added

File Purpose
lib/prompt-security.ts Shared security helpers: buildSystemPrompt, sanitizeUserText, checkUserInput, isOffTopicSentinel, validateOutputShape, capOutput, OFF_TOPIC_SENTINEL, MAX_OUTPUT_BYTES
lib/concurrency-limit.ts Redis-backed per-identity in-flight cap: acquireSlot / releaseSlot / MAX_INFLIGHT_PER_IDENTITY
Development Docs/AI_PROMPT_SECURITY_AUDIT.md The audit document this MR delivers against
Development Docs/Security_Merge_Request.md This file

Files modified

File Change
lib/openai.ts Hardened system prompt, tagged & sanitised user fields, 200 KB output cap, Gemini system_instruction
app/api/generate/route.ts zod validation; SSRF guard; BYO-key gate on additionalInstructions; per-IP rate-limit identifier; pre-flight intent check; off-topic sentinel handling; concurrency cap; tighter default maxTokens
app/api/v1/generate/route.ts Tighter zod (schema ≤ 8 KB, examples ≤ 4 KB, maxTokens ≤ 8000); additionalInstructions removed from public contract; pre-flight intent check; off-topic sentinel handling; SQL dangerous-statement rejection; concurrency cap
components/generator/config-panel.tsx Disables Additional Instructions textarea unless BYO-key is active; shows a context-aware helper note (anonymous / signed-in-no-key / toggle-off)

Behavioural / contract changes

  1. /api/v1/generateadditionalInstructions is no longer an accepted field. Existing callers sending it will receive a 400 from zod (Unrecognized key if .strict() is later added; today it is silently ignored). Update Postman collection and docs/EXTERNAL_API.md accordingly.
  2. /api/v1/generatemaxTokens ceiling lowered from 100000 to 8000; default lowered from 4000 to 2000. Callers requesting >8000 will receive a 400.
  3. /api/generate — sending additionalInstructions without overrideApiKey now returns a 400 with reason: "byo_key_required". The web UI prevents this state automatically.
  4. /api/generate — sending overrideBaseUrl or overrideHeaders without overrideApiKey now returns a 400.
  5. Both routes — requests flagged by the pre-flight intent check return a 400 with field and reason fields. Off-topic model refusals return 422 and do not consume the daily quota.
  6. Both routes — third concurrent in-flight request from the same identity returns a 429 with reason: "concurrency_limit".

Rollout / rollback notes

  • Rollout is purely additive at the infrastructure layer — no migrations, no env-var changes. The new files (lib/prompt-security.ts, lib/concurrency-limit.ts) have no external dependencies beyond the existing redis client and zod.
  • Concurrency cap fails open if Redis is unavailable, so a Redis outage will not break the generator (it will just lose the cap temporarily).
  • Rollback is a clean revert of the listed files — no irreversible state.
  • Follow-up work: update docs/EXTERNAL_API.md and the Postman collection to remove additionalInstructions from the v1 examples. Not blocking this MR.

Risk notes

  • Denylist false positives (P1-3) are bypassable but also bluntly worded. If a legitimate user trips one, they get a clear error naming the field and the offending pattern, so they can rephrase. The audit's deferred P2-1 (flag-logging) would let us tune these patterns from real traffic.
  • x-forwarded-for trust (P0-2) requires the platform to set the header. Vercel does. If self-hosted behind a proxy that doesn't strip inbound x-forwarded-for, callers can spoof their IP — revisit before any non-Vercel deploy.
  • Concurrency cap key choice uses userEmail for /api/generate (matches the existing daily-rate-limit identifier) and apikey:<id> for v1. This means a single user with both a session and an API key has two independent 2-slot pools — acceptable, since each pool is checked against its own quota.

@vercel
Copy link
Copy Markdown

vercel Bot commented May 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
ai-mocker Ignored Ignored May 5, 2026 7:52am

@GeorgeBPrice
Copy link
Copy Markdown
Owner Author

Hardens both /api/generate (web UI) and /api/v1/generate (public API) against prompt-injection, jailbreak, off-topic abuse, SSRF, and quota-drain attacks identified in the audit. No user-facing functionality is removed for normal users; the only deliberate behavioural change is that additionalInstructions is now BYO-key only (and is no longer part of the public v1 contract).

@GeorgeBPrice GeorgeBPrice merged commit 7acf05e into main May 5, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant