AI Prompt Hardening & Improvements#8
Merged
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
Owner
Author
|
Hardens both /api/generate (web UI) and /api/v1/generate (public API) against prompt-injection, jailbreak, off-topic abuse, SSRF, and quota-drain attacks identified in the audit. No user-facing functionality is removed for normal users; the only deliberate behavioural change is that additionalInstructions is now BYO-key only (and is no longer part of the public v1 contract). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
MR: AI prompt security hardening
Branch:
Redevelopment--UI-overhual-+-Nextjs-updateAudit reference:
Development Docs/Security/AI_PROMPT_SECURITY_AUDIT.mdSummary
Hardens both
/api/generate(web UI) and/api/v1/generate(public API) against prompt-injection, jailbreak, off-topic abuse, SSRF, and quota-drain attacks identified in the audit. No user-facing functionality is removed for normal users; the only deliberate behavioural change is thatadditionalInstructionsis now BYO-key only (and is no longer part of the public v1 contract).This MR implements every Tier 0 and Tier 1 item from the audit, plus the high-impact items from Tier 2 (P2-3, P2-4) and Tier 3 (P3-2). Remaining items (P2-1, P2-2, P3-1, P3-3) are documented as deferred for future consideration.
What changed and why
Tier 0 — Infrastructure-level fixes
overrideBaseUrl/overrideHeaderson/api/generatenow require a caller-suppliedoverrideApiKeyprocess.env.OPENAI_API_KEYattachedanon:<client-ip>(fromx-forwarded-for/x-real-ip)'anonymous'daily bucket — effectively a free LLM proxy until the bucket filled/api/generate(matches v1 +override*extras)maxTokensceiling:100000 → 8000Tier 1 — Prompt hardening
{"error":"off_topic"}on anything off-task<SCHEMA>/<EXAMPLES>/<INSTRUCTIONS>tags; reserved tag names are stripped from the input so a caller can't forge a closing tagadditionalInstructionsandexamples— rejects jailbreak phrases (ignore previous,disregard,pretend,developer mode,DAN,reveal your prompt, …) and clearly off-topic verbs (write a poem/essay/story,translate,summarise,recipe,give me advice,write python code, …).schemais intentionally not shape-checked — informal"name, age, dob, address"remains validschema ≤ 8 KB,examples ≤ 4 KB,additionalInstructions ≤ 1 KBDROP/UPDATE/DELETE/ALTER/TRUNCATE/GRANT/REVOKE/CREATE USER/EXECis rejectedsystem_instructionfield instead of fusing system+user into a singleparts.text, restoring the role boundaryTier 2 / Tier 3 — Selected items
INCR/DECRwith 120 s safety TTL), fails open on Redis unavailability. Keyed byuserEmail/anon:<ip>on/api/generateandapikey:<id>on/api/v1/generate. Returns 429 withreason: "concurrency_limit"when exceededmaxTokens:2000when noadditionalInstructions,4000otherwise. Public v1 default is fixed at2000additionalInstructionsremoved from the public v1 contract entirely (zod, request interface,generateMockDatacall). On/api/generatethe field is gated to BYO-key callers — sending it withoutoverrideApiKeyreturns400 byo_key_required. The Generator UI disables the textarea and shows a context-aware helper note when not BYO-key (different message for anonymous, no-key-saved, and toggle-off states)Deferred (documented in the audit)
user, Anthropicmetadata.user_id).Files added
lib/prompt-security.tsbuildSystemPrompt,sanitizeUserText,checkUserInput,isOffTopicSentinel,validateOutputShape,capOutput,OFF_TOPIC_SENTINEL,MAX_OUTPUT_BYTESlib/concurrency-limit.tsacquireSlot/releaseSlot/MAX_INFLIGHT_PER_IDENTITYDevelopment Docs/AI_PROMPT_SECURITY_AUDIT.mdDevelopment Docs/Security_Merge_Request.mdFiles modified
lib/openai.tssystem_instructionapp/api/generate/route.tsadditionalInstructions; per-IP rate-limit identifier; pre-flight intent check; off-topic sentinel handling; concurrency cap; tighter defaultmaxTokensapp/api/v1/generate/route.tsschema ≤ 8 KB,examples ≤ 4 KB,maxTokens ≤ 8000);additionalInstructionsremoved from public contract; pre-flight intent check; off-topic sentinel handling; SQL dangerous-statement rejection; concurrency capcomponents/generator/config-panel.tsxAdditional Instructionstextarea unless BYO-key is active; shows a context-aware helper note (anonymous / signed-in-no-key / toggle-off)Behavioural / contract changes
/api/v1/generate—additionalInstructionsis no longer an accepted field. Existing callers sending it will receive a 400 from zod (Unrecognized keyif.strict()is later added; today it is silently ignored). Update Postman collection anddocs/EXTERNAL_API.mdaccordingly./api/v1/generate—maxTokensceiling lowered from100000to8000; default lowered from4000to2000. Callers requesting >8000 will receive a 400./api/generate— sendingadditionalInstructionswithoutoverrideApiKeynow returns a 400 withreason: "byo_key_required". The web UI prevents this state automatically./api/generate— sendingoverrideBaseUrloroverrideHeaderswithoutoverrideApiKeynow returns a 400.fieldandreasonfields. Off-topic model refusals return 422 and do not consume the daily quota.reason: "concurrency_limit".Rollout / rollback notes
lib/prompt-security.ts,lib/concurrency-limit.ts) have no external dependencies beyond the existingredisclient andzod.docs/EXTERNAL_API.mdand the Postman collection to removeadditionalInstructionsfrom the v1 examples. Not blocking this MR.Risk notes
x-forwarded-fortrust (P0-2) requires the platform to set the header. Vercel does. If self-hosted behind a proxy that doesn't strip inboundx-forwarded-for, callers can spoof their IP — revisit before any non-Vercel deploy.userEmailfor/api/generate(matches the existing daily-rate-limit identifier) andapikey:<id>for v1. This means a single user with both a session and an API key has two independent 2-slot pools — acceptable, since each pool is checked against its own quota.