Skip to content

Check Catalog

Pengfei Hu edited this page May 31, 2026 · 2 revisions

Check Catalog

Agents Shipgate ships 80+ built-in checks across ~19 categories. Every check is deterministic, static, and exits cleanly on missing data. Default severities can be overridden per repo via checks.severity_overrides.

This page curates the foundational categories. For the complete, always-current list, browse it from the CLI (below) or read docs/checks.md.

You can also browse the live catalog from the CLI:

agents-shipgate list-checks
agents-shipgate list-checks --json
agents-shipgate explain SHIP-POLICY-APPROVAL-MISSING

Categories at a glance

Category Checks Covers
inventory 4 Enumerability, wildcard exposure, surface size, low-confidence production surfaces
schema 3 Broad free-text params, missing numeric bounds, free-form output
auth 4 Missing/broad scopes, scope-coverage gaps
scope 2 Tools outside declared purpose, prohibited tools present
policy 2 Missing approval / confirmation policies
side_effects 1 Missing idempotency on risky writes
evidence 4 HITL evidence: approval traces, override reasons, high-risk exclusions, promotion criteria
security 3 Injection / secret / sensitive-data exposure in the surface
manifest 5 Stale suppressions/policies/overrides, missing owners, unused scopes
baseline 3 Baseline drift and integrity
documentation 1 Missing / too-short descriptions
action_surface 11 Base→head action-surface diff: new/removed actions, scope & effect escalations, removed controls
verify 6 Verifier-cycle trust-root checks (SHIP-VERIFY-*) — policy/baseline/CI/instruction weakening routed to human review
api 11 OpenAI API artifacts: schema strictness, structured output, prompt/tool scope, operational readiness
adk 6 Google ADK extraction (dynamic toolsets, callbacks, eval coverage)
langchain 2 LangChain / LangGraph dynamic tool surfaces
crewai 2 CrewAI dynamic tool surfaces
codex_plugin 6 Codex plugin packages & marketplaces
n8n 5 n8n workflow tool surfaces and credential stubs

Counts move as checks are added; agents-shipgate list-checks --json is authoritative.


Severity contract

  • critical — strict CI exits 20 unless the finding is explicitly suppressed with a reason.
  • high — requires human review; does not fail strict CI by default. Configure via ci.fail_on: [critical, high].
  • medium — review during release hardening.
  • low / info — informational.

Suppressed findings (checks.ignore matched) keep their severity in the JSON report but are excluded from active counts and do not trigger CI failure.


Inventory

ID Default Fires when
SHIP-INVENTORY-NOT-ENUMERABLE high No tools were loaded from any source. The release gate fails closed.
SHIP-INVENTORY-WILDCARD-TOOLS high A source declares wildcard / all-tools exposure (wildcard: true in MCP).
SHIP-INVENTORY-TOOL-SURFACE-TOO-LARGE medium Tool count exceeds 50.
SHIP-INVENTORY-LOW-CONFIDENCE-PRODUCTION-SURFACE high environment.target is production and at least one tool came from a low/medium-confidence extraction (typically the SDK static AST).

Documentation

ID Default Fires when
SHIP-DOC-MISSING-DESCRIPTION medium Tool description is missing or shorter than 20 chars.
SHIP-DOC-INJECTION-RISK medium · high if multi-match on write Description contains instruction-override-like phrases (ignore previous instructions, you are now the system, etc).
SHIP-DOC-SECRET-IN-DESCRIPTION medium · high if multi-match on write Description matches secret patterns: sk-…, ghp_…, AKIA…, or `password

Schema

ID Default Fires when
SHIP-SCHEMA-BROAD-FREE-TEXT high A write/action-like tool has a free-form parameter named action, body, command, content, instructions, message, prompt, update(s).
SHIP-SCHEMA-MISSING-BOUNDS high A risky numeric parameter (amount, count, limit, quantity, total, refund_amount, max(imum)) on a write tool lacks a maximum.
SHIP-SCHEMA-FREEFORM-OUTPUT medium Tool returns string (or SDK -> str) — output may flow back into model context.

Auth

ID Default Fires when
SHIP-AUTH-MISSING-SCOPE high Write or sensitive-data tool has no declared auth scopes.
SHIP-AUTH-MANIFEST-BROAD-SCOPE high permissions.scopes contains *, admin, :*, or write-all.
SHIP-AUTH-TOOL-BROAD-SCOPE high A tool's own scope list contains a broad value.
SHIP-AUTH-SCOPE-COVERAGE-MISSING high Tool requires scopes not covered by permissions.scopes (after wildcard expansion).

Scope

ID Default Fires when
SHIP-SCOPE-TOOL-OUTSIDE-PURPOSE high agent.declared_purpose reads as read-only (tokenized) and a write-capable tool is attached.
SHIP-SCOPE-PROHIBITED-TOOL-PRESENT high Tokens of a prohibited_actions entry overlap with a tool's name/description AND no mitigating policy is declared.

Policy

ID Default Fires when
SHIP-POLICY-APPROVAL-MISSING critical Tool has a high-risk tag (destructive, infrastructure_change, financial_action, code_execution) at ≥ medium confidence and is not in require_approval_for_tools.
SHIP-POLICY-CONFIRMATION-MISSING high Tool has destructive, external_write, or customer_communication at ≥ medium and is not in require_confirmation_for_tools.

Side effects

ID Default Fires when
SHIP-SIDEFX-IDEMPOTENCY-MISSING high · critical when retry policy is known Tool is a write with financial_action / destructive / external_write, and lacks idempotency evidence — no idempotency_key parameter, no idempotentHint: true annotation, and not in require_idempotency_for_tools.

API (OpenAI Agents API artifacts)

ID Default Fires when
SHIP-API-FUNCTION-SCHEMA-STRICTNESS high (medium when low-risk) OpenAI API function schema lacks strict: true, additionalProperties: false, complete required, or has unbounded risky fields.
SHIP-API-STRUCTURED-OUTPUT-READINESS high (medium for under-spec) No response format declared, or response schema is too broad / missing decision/status enums / missing refusal/needs_review/error modeling.
SHIP-API-PROMPT-TOOL-SCOPE-MISMATCH high (medium for missing approval language) Prompt says "advise only" / "read-only" while write tools are enabled, OR high-risk tools lack approval/confirmation language in the prompt.
SHIP-API-OPERATIONAL-READINESS medium (high for retry+non-idempotent) Missing retry policy, timeouts, simple test cases, or per-tool success/failure output schemas; or a trace sample shows a required-approval tool called without approved: true.

The api family has grown well beyond these four foundational checks — see Categories at a glance above, and run agents-shipgate list-checks or agents-shipgate explain <ID> for the full set.

Manifest consistency

ID Default Fires when
SHIP-MANIFEST-STALE-SUPPRESSION medium checks.ignore references an unknown check ID or a tool not loaded.
SHIP-MANIFEST-STALE-POLICY medium A policies.require_* entry names a tool not loaded.
SHIP-MANIFEST-STALE-RISK-OVERRIDE medium risk_overrides.tools references a tool not loaded.
SHIP-MANIFEST-HIGH-RISK-OWNER-MISSING high environment.target is production_like or production and a high-risk tool has no owner declared in risk_overrides.tools.{tool}.owner.
SHIP-MANIFEST-UNUSED-SCOPE medium · high if broad permissions.scopes contains a scope not required by any loaded tool (and not covered by a wildcard).

Risk-hint reference

Checks consume risk hints with confidence thresholds. As of v0.2 the keyword classifier is tokenized, so "deploy" matches the standalone token but not the substring inside deployments. Plurals are explicit in the keyword sets where production scopes commonly use them (refund and refunds, cluster and clusters, etc.). See core/risk_hints.py for the source of truth.

Tag Triggered by
read_only HTTP GET (high), MCP readOnlyHint: true (high), *_preview SDK functions (high), name/description tokens get/list/lookup/search/status/preview/view (medium)
write HTTP POST/PUT/PATCH/DELETE (high), MCP destructiveHint: true (high), name tokens create/update/write/send/refund/cancel/delete/remove/charge/issue (medium)
destructive HTTP DELETE (high), MCP destructiveHint: true (high), tokens cancel/delete/destroy/remove (medium)
financial_action tokens refund(s)/payment(s)/charge(s)/invoice(s)/billing in name, description, or scopes
customer_communication tokens email(s)/message(s)/sms
external_write customer-comms tag + tokens send/external/customer and not effectively read-only
sensitive_data_access tokens ssn/pii/personal/secret(s)/credential(s)
code_execution tokens bash/command/execute/python/shell
infrastructure_change tokens aws/azure/cluster(s)/deploy/droplet(s)/gcp/kubernetes/terraform

To override these heuristics, use risk_overrides.tools — it always wins.


Plugin checks

External packages can register checks via the agents_shipgate.checks Python entry point. Plugins are disabled by default and must be opted in:

AGENTS_SHIPGATE_ENABLE_PLUGINS=1 agents-shipgate scan
agents-shipgate scan --no-plugins                    # force off even if env is set

The report's loaded_plugins array enumerates every third-party check that ran, with distribution name and version. See Plugin Authoring for the contract.

Clone this wiki locally