Security skills that make AI coding agents safe to run.
Install · Skills · Usage · Roadmap
Not another "make your agent do security" tool. safe-agent protects you from your agent — catching malicious skills before you install them and dangerous commands before they execute.
Snyk's ToxicSkills research found that 36.82% of AI agent skills have at least one security flaw — including prompt injection, credential theft, and data exfiltration. OWASP now tracks agentic skills as a first-class attack surface.
Every time you run /plugin install or npx skills add, you're trusting someone else's instructions inside your codebase, your shell, and your cloud credentials.
safe-agent adds a verification step.
| Skill | What it does | Lineage |
|---|---|---|
/skill-verify |
Audit any skill for prompt injection, exfiltration, and malicious patterns before you install it | Flare + OWASP |
/cost-guard |
Per-session token/dollar budget with reject/alert/downgrade modes | LOCO BudgetManager |
/tool-guard |
Allowlist/denylist for tool calls, approval gates for destructive ops, preset profiles | LOCO + Flare |
/behavior-watch |
Anomaly scoring on agent actions — flags unusual tool sequences, scope creep, first-seen patterns | Flare anomaly engine |
pre-exec-check |
Teaches the agent to pause and verify before destructive commands (rm -rf, force-push, DROP TABLE) | Flare heuristics |
Claude Code (plugin):
/plugin marketplace add ArielSmoliar/safe-agent
/plugin install safe-agent
Antigravity / Gemini CLI / Cursor / Codex:
git clone https://github.com/ArielSmoliar/safe-agent.git
cp -r safe-agent/skills/* .claude/skills/ # Claude Code
cp -r safe-agent/skills/* .antigravity/skills/ # Antigravity
cp -r safe-agent/skills/* .cursor/skills/ # Cursor
cp -r safe-agent/skills/* .codex/skills/ # Codexskills.sh (any agent):
npx skills@latest add ArielSmoliar/safe-agentSkills use the universal SKILL.md format (agentskills.io standard) — they work with any compatible agent.
/skill-verify path/to/skill-directory
Or verify a GitHub repo:
/skill-verify https://github.com/someone/cool-skills
Or audit all currently installed skills:
/skill-verify
You get a structured report:
## Skill Verify Report: cool-skill
Verdict: CAUTION
Risk Score: 42/100
### Findings
- [HIGH] Credential harvesting
File: scripts/setup.sh, line 14
What: Reads ~/.aws/credentials and stores in variable
Why: Combined with the curl on line 22, credentials could be exfiltrated
Evidence: `AWS_CREDS=$(cat ~/.aws/credentials)`
- [MEDIUM] Excessive permissions
File: SKILL.md, line 3
What: allowed-tools includes Bash but skill description is "code formatter"
Why: A formatter should not need shell access
### Summary
Files scanned: 4
Issues found: 2 (0 critical, 1 high, 1 medium, 0 low)
Recommendation: Install with modifications (remove scripts/setup.sh)
/cost-guard $5
/cost-guard 200k tokens alert
Tracks estimated spend and warns at 50%, 80%, and 100%. Three modes:
- reject — stop work at the limit
- alert — warn but continue (default)
- downgrade — suggest cheaper approaches as you approach the limit
/tool-guard profile readonly # only Read, Grep, Glob allowed
/tool-guard profile careful # everything gated
/tool-guard deny Bash # block shell access
/tool-guard gate Edit,Write # approve each file mutation
/behavior-watch
Produces a session activity report with anomaly scoring — flags scope creep, unusual tool call frequencies, access to sensitive files, and suspicious sequences (e.g., read credentials then make a network call).
pre-exec-check activates automatically. When the agent is about to run something risky, it checks against known danger patterns and warns you:
⚠ This command `git push --force origin main` force-pushes to main,
which overwrites remote history and affects all collaborators.
Safer alternative: `git push --force-with-lease origin main`
Proceed? (y/n)
This is behavioral guidance, not runtime interception — Claude consults the skill's rubric when it recognizes a potentially dangerous command. For hard enforcement (blocking commands at the shell level), see the roadmap below.
This is behavioral guidance for your AI coding agent — instructions that teach it to check for threats and pause before dangerous operations. It works because modern agents (Claude, Codex, Gemini) follow well-structured instructions reliably.
This is not a runtime enforcement engine. It cannot prevent a determined attacker from bypassing it, just as a code review cannot prevent all bugs. It catches the common cases — the 36.82% — and makes you aware before damage is done.
safe-agent isn't another weekend prompt engineering project. The threat patterns and detection heuristics come from:
- Flare — AI-powered anomaly detection for cloud audit logs, where we built and battle-tested Claude-based security analysis that scores anomalies 0-100 with baseline tracking and false-positive filtering
- LOCO — Load-aware scheduling for multi-agent systems, where we learned how agents compete for resources and where budget/authorization guardrails matter most
v0.2 — Cross-skill coordination
- Unified policy engine — behavior-watch detection triggers tool-guard restrictions automatically
- Multi-command exfiltration detection — catch split
cat .env > /tmp/x && curl -d @/tmp/xacross separate commands - Pre-exec-check user-intent awareness — distinguish "user asked for force-push" from "agent decided to force-push"
v0.3 — Deeper coverage
- MCP server audit — Extend skill-verify to scan MCP server configurations for trust boundary violations
- Hook-based enforcement — Shell scripts + Claude Code hooks for hard blocking of dangerous commands (deterministic, not behavioral)
- Resource exhaustion detection — catch infinite loops, memory bombs, bandwidth abuse within single commands
- Output/response tampering detection — detect if a malicious skill modifies tool output
v0.4 — Persistence and teams
- Cross-session memory — Track behavior baselines across sessions to detect drift over time
- Team profiles — Shareable tool-guard presets for org-wide security policies
MIT
Issues and PRs welcome. If you find a threat pattern we don't catch, open an issue — that's the most valuable contribution.