Agent-CAP

Benchmarking of Cost, Accuracy, and Performance for Agentic AI Systems

Agent-CAP is a step-level benchmarking framework that decomposes agentic workloads into atomic operations and measures each step's latency. It provides Nsight Systems-like timeline visualization for analyzing agent performance.

Features

Step-level Profiling: Trace individual operations in agentic workflows
Multiple Timer Backends: Support for time.perf_counter() and CUDA events
Timeline Visualization: Interactive HTML timelines similar to Nsight Systems
Zero Dependencies: Core functionality works without any external packages
Easy Integration: Simple decorators and context managers for instrumentation

Installation

# Basic installation
pip install -e .

# With visualization support
pip install -e ".[viz]"

# With CUDA timing support
pip install -e ".[cuda]"

# Full installation
pip install -e ".[all]"

Quick Start

from agent_cap import Tracer, StepType, TimelineVisualizer

# Create a tracer
tracer = Tracer("my-agent-workflow")

# Use context managers to trace steps
with tracer:
    with tracer.step("planning", StepType.PLANNING):
        # Your planning code here
        plan = agent.plan(task)

    with tracer.step("retrieval", StepType.RETRIEVAL):
        # Your retrieval code here
        docs = retriever.search(query)

    with tracer.step("reasoning", StepType.REASONING):
        # Your LLM inference code here
        response = llm.generate(prompt)

# Get the trace and visualize
trace = tracer.get_trace()

# Save to JSON
trace.save("trace.json")

# Create visualization
viz = TimelineVisualizer(trace)
viz.save_html("timeline.html")  # Interactive HTML
print(viz.to_ascii())           # Terminal output

Step Types

Agent-CAP categorizes workflow steps based on their computational characteristics:

Step Type	Description	Bottleneck
`PLANNING`	Task decomposition, high-level decisions	Compute-bound
`REASONING`	Chain-of-thought, inference	Memory-bandwidth bound
`RETRIEVAL`	Document fetch, RAG	I/O bound
`TOOL_CALLING`	External API calls	Network/CPU bound
`CODE_EXECUTION`	Running generated code	CPU/sandbox bound
`PREFILL`	LLM prefill phase	Compute-bound
`DECODE`	LLM decode phase	Memory-bandwidth bound
`EMBEDDING`	Embedding computation	Compute-bound

Decorator API

from agent_cap import tracer, StepType

@tracer("fetch_data", StepType.RETRIEVAL)
def fetch_data(query):
    return db.query(query)

@tracer("generate", StepType.DECODE)
def generate(prompt):
    return llm(prompt)

Visualization

Interactive HTML Timeline

from agent_cap import TimelineVisualizer

viz = TimelineVisualizer(trace)
viz.save_html("timeline.html")
viz.show()  # Opens in browser

ASCII Timeline (Terminal)

print(viz.to_ascii())

Output:

Timeline: my-agent-workflow
Total Duration: 1234.56 ms
============================================================
                        |    0  250  500  750 1000
                        +--------------------------------
planning               |████                    (150.2ms)
retrieval              |    ██████              (280.5ms)
reasoning              |          ████████████  (450.3ms)
============================================================

Summary Table

print(viz.summary_table())

Examples

Run the example scripts:

# Simple agent workflow
python examples/simple_agent.py

# RAG agent benchmark
python examples/rag_agent.py

CUDA Timing

For precise GPU timing, use CUDA events:

tracer = Tracer("gpu-workflow", use_cuda=True)

with tracer.step("gpu_compute", StepType.PREFILL):
    # GPU operations are timed with CUDA events
    model(input_tensor)

Acknowledgement

This project is supported by the Advanced Research and Invention Agency (ARIA)’s grant “Scaling Compute: AI at 1/1000th the cost. Technical Area 4 Benchmarking”.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
agent_cap		agent_cap
examples		examples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agent-CAP

Features

Installation

Quick Start

Step Types

Decorator API

Visualization

Interactive HTML Timeline

ASCII Timeline (Terminal)

Summary Table

Examples

CUDA Timing

Acknowledgement

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Auto-CAP/AgentCAP

Folders and files

Latest commit

History

Repository files navigation

Agent-CAP

Features

Installation

Quick Start

Step Types

Decorator API

Visualization

Interactive HTML Timeline

ASCII Timeline (Terminal)

Summary Table

Examples

CUDA Timing

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages