CCL-Bench

CCL-Bench is a trace-based benchmark for LLM infrastructure. Each benchmark row is backed by workload metadata and profiler artifacts, so results can be recomputed, audited, and extended as new models, frameworks, hardware, and collective communication libraries are added.

The project is organized around three layers:

Evidence: workload cards, run metadata, and external profiler traces.
Analysis: metric tools that consume trace directories and return leaderboard values.
Presentation: a static website generated from configured trace and metric pairs.

Raw traces are not included. The repository keeps lightweight metadata, scripts, metric code, and generated website data. However, we provide a sample trace for testing purpose, under llama3-torchtitan-nccl-4gpu-fsdp_2-tp_2-b_4-s_512.

Repository Layout

Path	Purpose
`workload_card_template.yaml`	Workload card template for benchmark rows.
`trace_collection/`	Lightweight workload cards and run scripts.
`trace_gen/`	Guidance and helpers for collecting profiler traces.
`tools/`	Metric toolkit. Each metric is implemented as an importable tool.
`website/`	Static leaderboard and generated benchmark data.
`workload_suite/`	Standard workload definitions used to compare software and hardware.
`scripts/`	Reproducibility and collection scripts for specific systems or experiments.
`agent/`	Experimental/private config tuning agents.
`simulation/`	Experimental/private trace-based simulation utilities.

Quick Start

To use our toolkit and test the simulation pipeline, there is no need for GPUs.

Create a local environment:

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run one metric on a trace directory:

python tools/main.py --trace /path/to/trace_dir --metric avg_step_time

# Example
python tools/main.py --trace llama3-torchtitan-nccl-4gpu-fsdp_2-tp_2-b_4-s_512/  --metric avg_step_time

Now test the simulation pipeline.

Build the AstraSim Docker image (required for the simulation pipeline):

docker build -t astra-sim:latest .

The Docker build takes 20–40 minutes and produces a ~14 GB image.

Run a what-if simulation on the sample trace (requires the AstraSim Docker image):

# Baseline
python simulation/pipeline.py --mode comm-only \
    --trace-dir llama3-torchtitan-nccl-4gpu-fsdp_2-tp_2-b_4-s_512

# What-if: 2× intra-node bandwidth
python simulation/pipeline.py --mode comm-only \
    --trace-dir llama3-torchtitan-nccl-4gpu-fsdp_2-tp_2-b_4-s_512 \
    --intra-bandwidth 600

Benchmark results

You can see the results we computed over the traces we collected by running:

python -m http.server 8081

Then open http://localhost:8081.

If you want to render new traces, update website/benchmark_config.json and upload entries. Regenerate the static website data after adding or changing configured traces.

python website/generate_data.py
cd website
python -m http.server 8081

Adding A Benchmark Row

Select a standard workload from workload_suite/ or trace_collection/workload.md.
Collect profiler artifacts outside the repository. Keep the final trace directory name stable.
Fill in workload_card_template.yaml and store the card with the trace artifacts.
Add the lightweight workload card under trace_collection/<workload_name>/ when it is useful for review and reproducibility.
Add the trace and metric mapping to website/benchmark_config.json.
Regenerate website/benchmark_data.json and website/data.js.

Each row should make clear:

model, phase, precision, dataset, batch size, and sequence lengths;
hardware type, GPU/TPU count, and per-node count;
framework and compiler/runtime versions;
tensor/data/pipeline/expert parallelism;
communication library and relevant environment variables;
which trace artifacts were used for each metric.

Metrics

Metrics are implemented in tools/ and invoked through tools/main.py. The public website uses the subset configured in website/benchmark_config.json; additional tools can remain in the repository for experiments as long as they are documented and do not require checked-in raw traces.

See tools/README.md for the supported metric interface and current dashboard metrics.

Artifact Policy

Commit:

source code and scripts required to reproduce a row;
workload cards and small metadata files;
generated website JSON/JS when updating the public leaderboard;
documentation explaining non-obvious trace or environment requirements.

Do not commit:

virtual environments or package caches;
raw profiler dumps unless they are intentionally tiny test fixtures;
local API keys, credentials, or machine-specific scratch paths;
large intermediate logs that are not part of the artifact.

Trace Storage Location

The canonical shared trace directory is set to be /data/ccl-bench_trace_collection. This path appears in three places and must be updated consistently if you move traces to a different mount point or machine:

Location	How to change
`website/benchmark_config.json` — every `"trace":` path	Update each path prefix to match your local mount point. The paths must resolve on whichever machine runs `python website/generate_data.py`.
`agent/ccl_bench_agent/tuning_config.yaml` — `publish_dir`	Set `publish_dir` to the desired destination. CCL-Search copies per-iteration traces there. Leave empty to skip publishing.

If you are running on a different cluster, set publish_dir in tuning_config.yaml and update the "trace": paths in benchmark_config.json before regenerating the website.

Development Notes

CCL-Search (agent/) and the simulation pipeline (simulation/) are first-class contributions: CCL-Search automates configuration tuning and records every trial as a benchmark entry; the simulation pipeline converts traces to Chakra execution graphs for Astra-Sim what-if analysis. Both require the shared trace directory to be accessible.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
agent		agent
assets		assets
llama3-torchtitan-nccl-4gpu-fsdp_2-tp_2-b_4-s_512		llama3-torchtitan-nccl-4gpu-fsdp_2-tp_2-b_4-s_512
plug-ins		plug-ins
simulation		simulation
tools		tools
trace_collection		trace_collection
trace_gen		trace_gen
upload_server		upload_server
website		website
workload_suite		workload_suite
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
index.html		index.html
leaderboard.html		leaderboard.html
requirements.txt		requirements.txt
workload_card_template.yaml		workload_card_template.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CCL-Bench

Repository Layout

Quick Start

Benchmark results

Adding A Benchmark Row

Metrics

Artifact Policy

Trace Storage Location

Development Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CCL-Bench

Repository Layout

Quick Start

Benchmark results

Adding A Benchmark Row

Metrics

Artifact Policy

Trace Storage Location

Development Notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages