feat(skills-guard): wire up LLM security audit behind --llm-audit flag#3234
Open
edmundman wants to merge 1 commit intoNousResearch:mainfrom
Open
feat(skills-guard): wire up LLM security audit behind --llm-audit flag#3234edmundman wants to merge 1 commit intoNousResearch:mainfrom
edmundman wants to merge 1 commit intoNousResearch:mainfrom
Conversation
llm_audit_skill() in tools/skills_guard.py was fully implemented but
never called — a dead function since it was written. This surfaces it
as an opt-in second pass on top of the existing regex scanner.
Problem
The static regex scan in scan_skill() is fast and reliable for
known-bad patterns (exfiltration regexes, reverse shells, credential
leakage, etc.), but it cannot catch threats expressed in natural
language prose — subtle social engineering, multi-step exfiltration
described across sentences, or jailbreak framing that avoids any
keyword the regexes target. llm_audit_skill() was written to fill
exactly this gap, calling the user's configured LLM as a second
opinion after the static scan, but nothing ever invoked it.
Fix
Add --llm-audit / dest=llm_audit to both `hermes skills install` and
`hermes skills audit` in hermes_cli/main.py.
Wire the flag through skills_command() and handle_skills_slash() in
hermes_cli/skills_hub.py into the do_install() and do_audit() functions,
which now accept llm_audit: bool = False.
When llm_audit=True, after scan_skill() completes, llm_audit_skill() is
called on the same path with the static ScanResult. Its findings are
merged in before the install-policy decision is made, so an LLM-detected
critical finding can block a community skill just as a regex finding
would. The LLM verdict can only raise severity, never lower it, and a
failed LLM call is best-effort — install is never blocked due to an API
error.
The flag defaults to False so existing behaviour (fast, offline,
regex-only scan) is unchanged for all current users.
Usage
hermes skills install owner/repo/skill --llm-audit
hermes skills audit --llm-audit
hermes skills audit my-skill --llm-audit
/skills install owner/repo/skill --llm-audit
/skills audit --llm-audit
Tests
40 new tests across five classes in tests/tools/test_skills_guard.py:
TestParseLlmResponse — JSON parsing, markdown unwrapping, severity
normalisation, truncation, malformed inputs
TestLlmAuditSkill — dangerous skip, no-content skip, no-model skip,
API failure passthrough, finding merge, verdict
raise (safe→caution, caution→dangerous), verdict
cannot-lower invariant, explicit model arg,
single-file path, content truncation
TestDoInstallLlmAuditWiring — llm_audit=False never calls llm_audit_skill,
llm_audit=True calls it, LLM-raised dangerous
verdict blocks install
TestDoAuditLlmAuditWiring — flag off/on, once-per-skill count, name
filter respected
TestCliArgLlmAudit — argparse defaults, flag parsing, router dispatch
TestSlashCommandLlmAuditWiring — slash parse for install and audit,
name+flag combination
Full suite: 93/93 passed in test_skills_guard.py; 6197 passing across
the broader suite with zero new failures introduced.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
llm_audit_skill() in tools/skills_guard.py was fully implemented but never called — a dead function since it was written. This surfaces it as an opt-in second pass on top of the existing regex scanner.
Problem
The static regex scan in scan_skill() is fast and reliable for known-bad patterns (exfiltration regexes, reverse shells, credential leakage, etc.), but it cannot catch threats expressed in natural language prose — subtle social engineering, multi-step exfiltration described across sentences, or jailbreak framing that avoids any keyword the regexes target. llm_audit_skill() was written to fill exactly this gap, calling the user's configured LLM as a second opinion after the static scan, but nothing ever invoked it.
Fix
Add --llm-audit / dest=llm_audit to both
hermes skills installandhermes skills auditin hermes_cli/main.py.Wire the flag through skills_command() and handle_skills_slash() in hermes_cli/skills_hub.py into the do_install() and do_audit() functions, which now accept llm_audit: bool = False.
When llm_audit=True, after scan_skill() completes, llm_audit_skill() is called on the same path with the static ScanResult. Its findings are merged in before the install-policy decision is made, so an LLM-detected critical finding can block a community skill just as a regex finding would. The LLM verdict can only raise severity, never lower it, and a failed LLM call is best-effort — install is never blocked due to an API error.
The flag defaults to False so existing behaviour (fast, offline, regex-only scan) is unchanged for all current users.
Usage
Tests
40 new tests across five classes in tests/tools/test_skills_guard.py:
TestParseLlmResponse — JSON parsing, markdown unwrapping, severity
normalisation, truncation, malformed inputs
TestLlmAuditSkill — dangerous skip, no-content skip, no-model skip,
API failure passthrough, finding merge, verdict
raise (safe→caution, caution→dangerous), verdict
cannot-lower invariant, explicit model arg,
single-file path, content truncation
TestDoInstallLlmAuditWiring — llm_audit=False never calls llm_audit_skill,
llm_audit=True calls it, LLM-raised dangerous
verdict blocks install
TestDoAuditLlmAuditWiring — flag off/on, once-per-skill count, name
filter respected
TestCliArgLlmAudit — argparse defaults, flag parsing, router dispatch
TestSlashCommandLlmAuditWiring — slash parse for install and audit,
name+flag combination
Full suite: 93/93 passed in test_skills_guard.py; 6197 passing across the broader suite with zero new failures introduced.