feat(skills-guard): wire up LLM security audit behind --llm-audit flag by edmundman · Pull Request #3234 · NousResearch/hermes-agent

edmundman · 2026-03-26T18:05:31Z

llm_audit_skill() in tools/skills_guard.py was fully implemented but never called — a dead function since it was written. This surfaces it as an opt-in second pass on top of the existing regex scanner.

Problem

The static regex scan in scan_skill() is fast and reliable for known-bad patterns (exfiltration regexes, reverse shells, credential leakage, etc.), but it cannot catch threats expressed in natural language prose — subtle social engineering, multi-step exfiltration described across sentences, or jailbreak framing that avoids any keyword the regexes target. llm_audit_skill() was written to fill exactly this gap, calling the user's configured LLM as a second opinion after the static scan, but nothing ever invoked it.

Fix

Add --llm-audit / dest=llm_audit to both hermes skills install and hermes skills audit in hermes_cli/main.py.

Wire the flag through skills_command() and handle_skills_slash() in hermes_cli/skills_hub.py into the do_install() and do_audit() functions, which now accept llm_audit: bool = False.

When llm_audit=True, after scan_skill() completes, llm_audit_skill() is called on the same path with the static ScanResult. Its findings are merged in before the install-policy decision is made, so an LLM-detected critical finding can block a community skill just as a regex finding would. The LLM verdict can only raise severity, never lower it, and a failed LLM call is best-effort — install is never blocked due to an API error.

The flag defaults to False so existing behaviour (fast, offline, regex-only scan) is unchanged for all current users.

Usage

hermes skills install owner/repo/skill --llm-audit
hermes skills audit --llm-audit
hermes skills audit my-skill --llm-audit
/skills install owner/repo/skill --llm-audit
/skills audit --llm-audit

Tests

40 new tests across five classes in tests/tools/test_skills_guard.py:

TestParseLlmResponse — JSON parsing, markdown unwrapping, severity
normalisation, truncation, malformed inputs
TestLlmAuditSkill — dangerous skip, no-content skip, no-model skip,
API failure passthrough, finding merge, verdict
raise (safe→caution, caution→dangerous), verdict
cannot-lower invariant, explicit model arg,
single-file path, content truncation
TestDoInstallLlmAuditWiring — llm_audit=False never calls llm_audit_skill,
llm_audit=True calls it, LLM-raised dangerous
verdict blocks install
TestDoAuditLlmAuditWiring — flag off/on, once-per-skill count, name
filter respected
TestCliArgLlmAudit — argparse defaults, flag parsing, router dispatch
TestSlashCommandLlmAuditWiring — slash parse for install and audit,
name+flag combination

Full suite: 93/93 passed in test_skills_guard.py; 6197 passing across the broader suite with zero new failures introduced.

[ X] 🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
[ X] 🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

llm_audit_skill() in tools/skills_guard.py was fully implemented but never called — a dead function since it was written. This surfaces it as an opt-in second pass on top of the existing regex scanner. Problem The static regex scan in scan_skill() is fast and reliable for known-bad patterns (exfiltration regexes, reverse shells, credential leakage, etc.), but it cannot catch threats expressed in natural language prose — subtle social engineering, multi-step exfiltration described across sentences, or jailbreak framing that avoids any keyword the regexes target. llm_audit_skill() was written to fill exactly this gap, calling the user's configured LLM as a second opinion after the static scan, but nothing ever invoked it. Fix Add --llm-audit / dest=llm_audit to both `hermes skills install` and `hermes skills audit` in hermes_cli/main.py. Wire the flag through skills_command() and handle_skills_slash() in hermes_cli/skills_hub.py into the do_install() and do_audit() functions, which now accept llm_audit: bool = False. When llm_audit=True, after scan_skill() completes, llm_audit_skill() is called on the same path with the static ScanResult. Its findings are merged in before the install-policy decision is made, so an LLM-detected critical finding can block a community skill just as a regex finding would. The LLM verdict can only raise severity, never lower it, and a failed LLM call is best-effort — install is never blocked due to an API error. The flag defaults to False so existing behaviour (fast, offline, regex-only scan) is unchanged for all current users. Usage hermes skills install owner/repo/skill --llm-audit hermes skills audit --llm-audit hermes skills audit my-skill --llm-audit /skills install owner/repo/skill --llm-audit /skills audit --llm-audit Tests 40 new tests across five classes in tests/tools/test_skills_guard.py: TestParseLlmResponse — JSON parsing, markdown unwrapping, severity normalisation, truncation, malformed inputs TestLlmAuditSkill — dangerous skip, no-content skip, no-model skip, API failure passthrough, finding merge, verdict raise (safe→caution, caution→dangerous), verdict cannot-lower invariant, explicit model arg, single-file path, content truncation TestDoInstallLlmAuditWiring — llm_audit=False never calls llm_audit_skill, llm_audit=True calls it, LLM-raised dangerous verdict blocks install TestDoAuditLlmAuditWiring — flag off/on, once-per-skill count, name filter respected TestCliArgLlmAudit — argparse defaults, flag parsing, router dispatch TestSlashCommandLlmAuditWiring — slash parse for install and audit, name+flag combination Full suite: 93/93 passed in test_skills_guard.py; 6197 passing across the broader suite with zero new failures introduced.

alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/cli CLI entry point, hermes_cli/, setup wizard tool/skills Skills system (list, view, manage) labels May 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills-guard): wire up LLM security audit behind --llm-audit flag#3234

feat(skills-guard): wire up LLM security audit behind --llm-audit flag#3234
edmundman wants to merge 1 commit intoNousResearch:mainfrom
edmundman:feature/wire-llm-audit-skill

edmundman commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

edmundman commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants