GitHub - ArielSmoliar/safe-agent: Security skills that make AI coding agents safe to run. 5 drop-in skills: skill verification, cost tracking, tool authorization, behavioral anomaly detection, and pre-execution safety. Built on Flare + LOCO.

Security skills that make AI coding agents safe to run.

Not another "make your agent do security" tool. safe-agent protects you from your agent — catching malicious skills before you install them and dangerous commands before they execute.

The Problem

Snyk's ToxicSkills research found that 36.82% of AI agent skills have at least one security flaw — including prompt injection, credential theft, and data exfiltration. OWASP now tracks agentic skills as a first-class attack surface.

Every time you run /plugin install or npx skills add, you're trusting someone else's instructions inside your codebase, your shell, and your cloud credentials.

safe-agent adds a verification step.

Skills

Skill	What it does	Lineage
`/skill-verify`	Audit any skill for prompt injection, exfiltration, and malicious patterns before you install it	Flare + OWASP
`/cost-guard`	Per-session token/dollar budget with reject/alert/downgrade modes	LOCO BudgetManager
`/tool-guard`	Allowlist/denylist for tool calls, approval gates for destructive ops, preset profiles	LOCO + Flare
`/behavior-watch`	Anomaly scoring on agent actions — flags unusual tool sequences, scope creep, first-seen patterns	Flare anomaly engine
`pre-exec-check`	Teaches the agent to pause and verify before destructive commands (rm -rf, force-push, DROP TABLE)	Flare heuristics

Install

Claude Code (plugin):

/plugin marketplace add ArielSmoliar/safe-agent
/plugin install safe-agent

Antigravity / Gemini CLI / Cursor / Codex:

git clone https://github.com/ArielSmoliar/safe-agent.git
cp -r safe-agent/skills/* .claude/skills/      # Claude Code
cp -r safe-agent/skills/* .antigravity/skills/  # Antigravity
cp -r safe-agent/skills/* .cursor/skills/       # Cursor
cp -r safe-agent/skills/* .codex/skills/        # Codex

skills.sh (any agent):

npx skills@latest add ArielSmoliar/safe-agent

Skills use the universal SKILL.md format (agentskills.io standard) — they work with any compatible agent.

Usage

Verify a skill before installing

/skill-verify path/to/skill-directory

Or verify a GitHub repo:

/skill-verify https://github.com/someone/cool-skills

Or audit all currently installed skills:

/skill-verify

You get a structured report:

## Skill Verify Report: cool-skill

Verdict: CAUTION
Risk Score: 42/100

### Findings
- [HIGH] Credential harvesting
  File: scripts/setup.sh, line 14
  What: Reads ~/.aws/credentials and stores in variable
  Why: Combined with the curl on line 22, credentials could be exfiltrated
  Evidence: `AWS_CREDS=$(cat ~/.aws/credentials)`

- [MEDIUM] Excessive permissions
  File: SKILL.md, line 3
  What: allowed-tools includes Bash but skill description is "code formatter"
  Why: A formatter should not need shell access

### Summary
Files scanned: 4
Issues found: 2 (0 critical, 1 high, 1 medium, 0 low)
Recommendation: Install with modifications (remove scripts/setup.sh)

Set a session budget

/cost-guard $5
/cost-guard 200k tokens alert

Tracks estimated spend and warns at 50%, 80%, and 100%. Three modes:

reject — stop work at the limit
alert — warn but continue (default)
downgrade — suggest cheaper approaches as you approach the limit

Restrict tool access

/tool-guard profile readonly          # only Read, Grep, Glob allowed
/tool-guard profile careful           # everything gated
/tool-guard deny Bash                 # block shell access
/tool-guard gate Edit,Write           # approve each file mutation

Audit agent behavior

/behavior-watch

Produces a session activity report with anomaly scoring — flags scope creep, unusual tool call frequencies, access to sensitive files, and suspicious sequences (e.g., read credentials then make a network call).

Pre-execution safety

pre-exec-check activates automatically. When the agent is about to run something risky, it checks against known danger patterns and warns you:

⚠ This command `git push --force origin main` force-pushes to main,
  which overwrites remote history and affects all collaborators.
  Safer alternative: `git push --force-with-lease origin main`
  Proceed? (y/n)

This is behavioral guidance, not runtime interception — Claude consults the skill's rubric when it recognizes a potentially dangerous command. For hard enforcement (blocking commands at the shell level), see the roadmap below.

What This Is (and Isn't)

This is behavioral guidance for your AI coding agent — instructions that teach it to check for threats and pause before dangerous operations. It works because modern agents (Claude, Codex, Gemini) follow well-structured instructions reliably.

This is not a runtime enforcement engine. It cannot prevent a determined attacker from bypassing it, just as a code review cannot prevent all bugs. It catches the common cases — the 36.82% — and makes you aware before damage is done.

Built on Real Security Expertise

safe-agent isn't another weekend prompt engineering project. The threat patterns and detection heuristics come from:

Flare — AI-powered anomaly detection for cloud audit logs, where we built and battle-tested Claude-based security analysis that scores anomalies 0-100 with baseline tracking and false-positive filtering
LOCO — Load-aware scheduling for multi-agent systems, where we learned how agents compete for resources and where budget/authorization guardrails matter most

Roadmap

v0.2 — Cross-skill coordination

Unified policy engine — behavior-watch detection triggers tool-guard restrictions automatically
Multi-command exfiltration detection — catch split cat .env > /tmp/x && curl -d @/tmp/x across separate commands
Pre-exec-check user-intent awareness — distinguish "user asked for force-push" from "agent decided to force-push"

v0.3 — Deeper coverage

MCP server audit — Extend skill-verify to scan MCP server configurations for trust boundary violations
Hook-based enforcement — Shell scripts + Claude Code hooks for hard blocking of dangerous commands (deterministic, not behavioral)
Resource exhaustion detection — catch infinite loops, memory bombs, bandwidth abuse within single commands
Output/response tampering detection — detect if a malicious skill modifies tool output

v0.4 — Persistence and teams

Cross-session memory — Track behavior baselines across sessions to detect drift over time
Team profiles — Shareable tool-guard presets for org-wide security policies

License

MIT

Contributing

Issues and PRs welcome. If you find a threat pattern we don't catch, open an issue — that's the most valuable contribution.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.claude-plugin		.claude-plugin
assets		assets
skills		skills
tests		tests
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Problem

Skills

Install

Usage

Verify a skill before installing

Set a session budget

Restrict tool access

Audit agent behavior

Pre-execution safety

What This Is (and Isn't)

Built on Real Security Expertise

Roadmap

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

The Problem

Skills

Install

Usage

Verify a skill before installing

Set a session budget

Restrict tool access

Audit agent behavior

Pre-execution safety

What This Is (and Isn't)

Built on Real Security Expertise

Roadmap

License

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages