Skip to content

[BUG] --format sarif is a lossy subset of --format json (drops category, confidence, remediation, code_snippet, intent, tags, end_line) #229

Description

@ThePlenkov

What I'm seeing

Single-skill --format sarif and single-skill --format json are not equivalent: the SARIF output is a strict subset of the JSON output. Same scan, same target, same analyzer state, different level of detail.

Per-result fields that --format sarif drops but --format json carries:

Field JSON SARIF
category (human-readable rule category, e.g. "MCP Least Privilege")
confidence (0-1)
remediation (action-oriented fix advice) ❌ (folded into message.text)
code_snippet (actual code at the location)
intent (what the skill is trying to do)
tags (OWASP / MITRE / CWE / custom)
end_line
pattern (detection pattern)
finding

Even the basic message.text differs: in SARIF it's a compressed one-liner ("Skill has no declared permissions but code capabilities were detected: env."), while in JSON the full explanation is a separate field (explanation) with remediation as its own field.

Reproduction (skillspector 2.3.7)

skillspector scan ./.agents/skills/act/ --no-llm --format json > /tmp/act.json
skillspector scan ./.agents/skills/act/ --no-llm --format sarif > /tmp/act.sarif

jq '.issues[0]' /tmp/act.json
# → full schema: id, category, severity, confidence, location{file,start_line,end_line},
#   explanation, remediation, code_snippet, intent, tags[]

jq '.runs[0].results[0]' /tmp/act.sarif
# → { ruleId, level, message: {text: "Skill has no declared permissions..." },
#     locations: [{physicalLocation: {artifactLocation: {uri}, region: {startLine}}}] }
#   No category, no confidence, no remediation, no code_snippet, no intent, no tags.

A second issue with the SARIF output: the message.text is a truncated version of the explanation, but there's no properties or extension point carrying the full text, the remediation, or the snippet. Downstream tools can't reconstruct what's actually wrong.

Why it matters

SARIF has a real extension mechanism — properties on both the result and the rule, plus fullDescription and help on the rule — that lets tools carry arbitrary metadata in a forward-compatible way. CodeQL, ESLint, Snyk, Semgrep all use this. Skillspector's native SARIF doesn't, so downstream SARIF consumers (annotation converters, dashboards, the GitHub Security tab when available) lose most of the signal.

In our case, we wrote a JSON→SARIF wrapper that preserves the extra fields under properties so the to-annotations.py skill can surface them as inline GitHub PR annotations. But ideally the upstream SARIF output should be a faithful translation of the JSON output, not a lossy downgrade.

What I think the fix is

Treat --format sarif as a full translation of the per-issue JSON schema, not a re-summary. Concretely:

  • Map each JSON issue to a SARIF result with ruleId and level (already done).
  • Put category in tool.driver.rules[].shortDescription.text (or name).
  • Put the full explanation in message.text (not a truncated variant).
  • Put remediation in tool.driver.rules[].help.text (and help.markdown when applicable).
  • Put the extra fields (confidence, code_snippet, intent, tags, end_line, pattern, finding) under the result's properties so any SARIF-aware tool can pick them up.

Related

  • Companion: [BUG] --recursive --format json emits per-skill summary, drops the full per-issue schema.
  • #148 (closed): SARIF output non-compliant — missing required rules[] array in tool.driver.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions