Skip to content

[feat] Injection screening: hard rules + confidence gate (R-417 + R-418)#7

Draft
100yenadmin wants to merge 2 commits into
mainfrom
feat/injection-screening-r417-r418
Draft

[feat] Injection screening: hard rules + confidence gate (R-417 + R-418)#7
100yenadmin wants to merge 2 commits into
mainfrom
feat/injection-screening-r417-r418

Conversation

@100yenadmin
Copy link
Copy Markdown
Member

Closes 100yenadmin/electric-sheep#1902, closes 100yenadmin/electric-sheep#1903

Copilot AI review requested due to automatic review settings April 11, 2026 18:09
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 11, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e77b3266-f73a-4485-9c3b-64f905cbaaa9

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/injection-screening-r417-r418

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds injection-screening safeguards to Cortex memory injection to reduce prompt-injection/stale-context risks, with configurable hard rules and mode-based confidence thresholds (R-417/R-418).

Changes:

  • Extend plugin config with injection screening toggles and per-mode thresholds.
  • Add mode detection + two-layer screening function to drop low-confidence/stale/contradictory memories before formatting injection context.
  • Wire the screening step into the before_agent_start recall injection path and update published dist/ artifacts accordingly.

Reviewed changes

Copilot reviewed 1 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/index.ts Adds new config fields, introduces detectInjectionMode + screenInjectionCandidates, and applies screening before formatMemoryContext.
dist/index.js Compiled output reflecting the new screening logic and new named exports.
dist/index.js.map Updated sourcemap for the compiled JS.
dist/index.d.ts Updated type declarations to include new config fields and new exported functions.
dist/index.d.ts.map Updated sourcemap for the declaration file.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/index.ts
Comment on lines +923 to +935
const RUN_ID_RE = /bench-\d{8}-\d{6}/g;
const GIT_TOKENS = /\b(git|PR #|commit|branch)\b/i;
const FILE_PATH_RE = /[./\\][a-zA-Z0-9_\-./\\]{2,}/;
const LIVENESS_CLAIM = /\b(still active|still running|is running|is active|is alive|currently running)\b/i;
const DEATH_CLAIM = /\b(was killed|is dead|died|crashed|no listener|restarted|dead\b|killed\b|stalled)\b/i;

/**
* Classify the current turn into an injection mode.
* critical > technical > personal (first match wins).
*/
export function detectInjectionMode(promptText: string): InjectionMode {
if (CRITICAL_KEYWORDS.test(promptText) || RUN_ID_RE.test(promptText)) return "critical";
if (
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RUN_ID_RE is declared with the global (/g) flag but is used with RegExp.test() in detectInjectionMode. Global regexes are stateful (lastIndex is advanced), so subsequent calls (including the later matchAll(RUN_ID_RE) in screenInjectionCandidates) can miss matches depending on call order. Consider using a non-global regex for test() (or resetting lastIndex / cloning the regex) and keep a separate global instance only for matchAll().

Copilot uses AI. Check for mistakes.
Comment thread src/index.ts
Comment on lines +987 to +1003
// If the prompt already contains a death claim, or the run ID isn’t a live process,
// drop this memory (it was captured when the run was alive, now stale).
let isStale = false;
if (promptHasDeathClaim) {
isStale = true;
} else {
// Check if prompt explicitly references this run as dead / a different run took over
for (const runId of contentRunIds) {
if (promptRunIds.has(runId) && DEATH_CLAIM.test(promptText)) {
isStale = true;
break;
}
// Also stale if prompt never mentions this run ID at all but does mention death
if (!promptRunIds.has(runId) && promptHasDeathClaim) {
isStale = true;
break;
}
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stale run-state branch’s run-id specific checks are currently ineffective: the else block only runs when promptHasDeathClaim is false, but inside it you test DEATH_CLAIM.test(promptText) and promptHasDeathClaim again, which will always be false in that branch. As a result, this filter drops all liveness-claim memories whenever the prompt contains any death claim (even unrelated), and never drops based on matching run IDs. Restructure the conditions so run-id matching is evaluated when a death claim is present (or remove the redundant tests).

Suggested change
// If the prompt already contains a death claim, or the run ID isn’t a live process,
// drop this memory (it was captured when the run was alive, now stale).
let isStale = false;
if (promptHasDeathClaim) {
isStale = true;
} else {
// Check if prompt explicitly references this run as dead / a different run took over
for (const runId of contentRunIds) {
if (promptRunIds.has(runId) && DEATH_CLAIM.test(promptText)) {
isStale = true;
break;
}
// Also stale if prompt never mentions this run ID at all but does mention death
if (!promptRunIds.has(runId) && promptHasDeathClaim) {
isStale = true;
break;
}
// If the prompt contains a death claim for this run (or contains an unscoped death
// claim with no run IDs at all), drop this memory as stale.
let isStale = false;
if (promptHasDeathClaim) {
if (promptRunIds.size === 0) {
isStale = true;
} else {
// Check if prompt explicitly references this run as dead / a different run took over.
for (const runId of contentRunIds) {
if (promptRunIds.has(runId)) {
isStale = true;
break;
}
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants