Skip to content

feat: expose LLM token usage in JSON report output #242

Description

@amber-beasley-liatrio

Summary

When SkillSpector runs its LLM-backed analyzers (semantic analysis, meta-analyzer), the LangChain AIMessage response carries token usage data via response.usage_metadata (input_tokens, output_tokens, total_tokens). That data is currently discarded after findings are extracted. Nothing in the JSON report tells callers how many tokens were consumed.

Motivation

Downstream pipelines that orchestrate multiple scanners need token counts to compute LLM call costs (e.g. (input_tokens × rate_in + output_tokens × rate_out) / 1_000_000). Without this field in the report, cost attribution requires fragile workarounds such as patching LangChain internals or proxying HTTP traffic — approaches tightly coupled to implementation details that break across versions.

Proposed change

src/skillspector/state.py

Extend LLMCallRecord to carry token counts, and update llm_call_record() to accept them:

class LLMCallRecord(TypedDict):
    node: str
    ok: bool
    error: str | None
    input_tokens: int   # 0 when ok=False
    output_tokens: int  # 0 when ok=False

src/skillspector/llm_analyzer_base.py

run_batches() and arun_batches() invoke the LLM and discard the response object after calling parse_response(). LangChain's AIMessage exposes usage_metadata; those counts should be extracted and accumulated:

# after: response = self._structured_llm.invoke(prompt)  (or ainvoke)
usage = getattr(response, "usage_metadata", None) or {}
input_tokens += usage.get("input_tokens", 0)
output_tokens += usage.get("output_tokens", 0)

src/skillspector/nodes/report.py

Include aggregated token counts in the JSON output path, summed from state["llm_call_log"]:

{
  "llm_usage": {
    "input_tokens": 1234,
    "output_tokens": 567
  }
}

Acceptance criteria

  • JSON report includes llm_usage: {input_tokens: N, output_tokens: N} when LLM analyzers ran
  • Field is omitted (or zeroed) when --no-llm / static-only scan
  • Existing report fields are unchanged (additive only)
  • Unit tests cover the new field

Notes

Per the contribution guide I plan to follow the fork → branch → PR process and reference this issue. Happy to take it on — just flagging here first as requested.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions