[feat] Injection screening: hard rules + confidence gate (R-417 + R-418) by 100yenadmin · Pull Request #7 · electricsheephq/evaos-cortex-plugin

100yenadmin · 2026-04-11T18:09:54Z

Closes 100yenadmin/electric-sheep#1902, closes 100yenadmin/electric-sheep#1903

coderabbitai · 2026-04-11T18:10:03Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e77b3266-f73a-4485-9c3b-64f905cbaaa9

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/injection-screening-r417-r418

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Adds injection-screening safeguards to Cortex memory injection to reduce prompt-injection/stale-context risks, with configurable hard rules and mode-based confidence thresholds (R-417/R-418).

Changes:

Extend plugin config with injection screening toggles and per-mode thresholds.
Add mode detection + two-layer screening function to drop low-confidence/stale/contradictory memories before formatting injection context.
Wire the screening step into the before_agent_start recall injection path and update published dist/ artifacts accordingly.

Reviewed changes

Copilot reviewed 1 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/index.ts	Adds new config fields, introduces `detectInjectionMode` + `screenInjectionCandidates`, and applies screening before `formatMemoryContext`.
dist/index.js	Compiled output reflecting the new screening logic and new named exports.
dist/index.js.map	Updated sourcemap for the compiled JS.
dist/index.d.ts	Updated type declarations to include new config fields and new exported functions.
dist/index.d.ts.map	Updated sourcemap for the declaration file.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-11T18:13:07Z

+const RUN_ID_RE = /bench-\d{8}-\d{6}/g;
+const GIT_TOKENS = /\b(git|PR #|commit|branch)\b/i;
+const FILE_PATH_RE = /[./\\][a-zA-Z0-9_\-./\\]{2,}/;
+const LIVENESS_CLAIM = /\b(still active|still running|is running|is active|is alive|currently running)\b/i;
+const DEATH_CLAIM = /\b(was killed|is dead|died|crashed|no listener|restarted|dead\b|killed\b|stalled)\b/i;
+
+/**
+ * Classify the current turn into an injection mode.
+ * critical > technical > personal (first match wins).
+ */
+export function detectInjectionMode(promptText: string): InjectionMode {
+  if (CRITICAL_KEYWORDS.test(promptText) || RUN_ID_RE.test(promptText)) return "critical";
+  if (


RUN_ID_RE is declared with the global (/g) flag but is used with RegExp.test() in detectInjectionMode. Global regexes are stateful (lastIndex is advanced), so subsequent calls (including the later matchAll(RUN_ID_RE) in screenInjectionCandidates) can miss matches depending on call order. Consider using a non-global regex for test() (or resetting lastIndex / cloning the regex) and keep a separate global instance only for matchAll().

Copilot · 2026-04-11T18:13:07Z

+      // If the prompt already contains a death claim, or the run ID isn’t a live process,
+      // drop this memory (it was captured when the run was alive, now stale).
+      let isStale = false;
+      if (promptHasDeathClaim) {
+        isStale = true;
+      } else {
+        // Check if prompt explicitly references this run as dead / a different run took over
+        for (const runId of contentRunIds) {
+          if (promptRunIds.has(runId) && DEATH_CLAIM.test(promptText)) {
+            isStale = true;
+            break;
+          }
+          // Also stale if prompt never mentions this run ID at all but does mention death
+          if (!promptRunIds.has(runId) && promptHasDeathClaim) {
+            isStale = true;
+            break;
+          }


The stale run-state branch’s run-id specific checks are currently ineffective: the else block only runs when promptHasDeathClaim is false, but inside it you test DEATH_CLAIM.test(promptText) and promptHasDeathClaim again, which will always be false in that branch. As a result, this filter drops all liveness-claim memories whenever the prompt contains any death claim (even unrelated), and never drops based on matching run IDs. Restructure the conditions so run-id matching is evaluated when a death claim is present (or remove the redundant tests).

Suggested change

// If the prompt already contains a death claim, or the run ID isn’t a live process,

// drop this memory (it was captured when the run was alive, now stale).

let isStale = false;

if (promptHasDeathClaim) {

isStale = true;

} else {

// Check if prompt explicitly references this run as dead / a different run took over

for (const runId of contentRunIds) {

if (promptRunIds.has(runId) && DEATH_CLAIM.test(promptText)) {

isStale = true;

break;

}

// Also stale if prompt never mentions this run ID at all but does mention death

if (!promptRunIds.has(runId) && promptHasDeathClaim) {

isStale = true;

break;

}

// If the prompt contains a death claim for this run (or contains an unscoped death

// claim with no run IDs at all), drop this memory as stale.

let isStale = false;

if (promptHasDeathClaim) {

if (promptRunIds.size === 0) {

isStale = true;

} else {

// Check if prompt explicitly references this run as dead / a different run took over.

for (const runId of contentRunIds) {

if (promptRunIds.has(runId)) {

isStale = true;

break;

}

}

Eva added 2 commits April 12, 2026 01:09

feat(inject): add screening gate for memory injection

c3a57b4

chore: remove accidental compiled test artifacts

e42035b

Copilot AI review requested due to automatic review settings April 11, 2026 18:09

Copilot started reviewing on behalf of 100yenadmin April 11, 2026 18:10 View session

Copilot AI reviewed Apr 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Injection screening: hard rules + confidence gate (R-417 + R-418)#7

[feat] Injection screening: hard rules + confidence gate (R-417 + R-418)#7
100yenadmin wants to merge 2 commits into
mainfrom
feat/injection-screening-r417-r418

100yenadmin commented Apr 11, 2026

Uh oh!

coderabbitai Bot commented Apr 11, 2026

Review skipped

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

100yenadmin commented Apr 11, 2026

Uh oh!

coderabbitai Bot commented Apr 11, 2026

Review skipped

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants