TraceFlow is an OpenTelemetry-style observability platform built in Go. It provides trace-context propagation, span ingestion, trace aggregation, service-level latency breakdowns, and a browser waterfall UI for debugging request latency across microservices.
The repository includes a complete local demo: a checkout service calls inventory and payment, trace context is propagated across HTTP boundaries, spans are sent to the collector, and the UI reconstructs the full request path.
- W3C-style
traceparentpropagation - Go tracing SDK for span creation, injection, extraction, and reporting
- Collector API for span ingestion and trace aggregation
- Parent-child span linking across service boundaries
- Browser waterfall UI for trace latency visualization
- Service dependency path for each trace
- Prometheus-style collector metrics endpoint
- Tail-sampling policy for error and slow-trace retention
- Optional ClickHouse write-through sink
- Demo microservices:
checkout → inventory → payment - Docker Compose local deployment
- GitHub Actions CI workflow
Each request receives a trace ID. The checkout service creates the root span and propagates W3C-style traceparent headers to downstream services. Inventory and payment create child spans from the incoming trace context and report them to the collector. The collector groups spans by trace ID and reconstructs the full request timeline.
A checkout request produces one distributed trace:
checkout.request
├── inventory.lookup
└── payment.charge
The trace ID is preserved across services, while each span records its own service name, operation name, duration, status, span ID, and parent span ID.
The dashboard shows:
- collector health
- trace count, span count, service count, and p95 latency
- recent traces
- service dependency chain
- per-span latency bars
- end-to-end request duration
| Area | Technology |
|---|---|
| Collector | Go, net/http |
| SDK | Go trace-context propagation package |
| Demo Services | Go HTTP microservices |
| Storage | In-memory trace store, optional ClickHouse sink |
| UI | Embedded browser UI with waterfall rendering |
| Metrics | Prometheus-style text endpoint |
| Orchestration | Bash scripts, Docker Compose |
| CI/CD | GitHub Actions |
go test ./..../scripts/start_collector.shThe collector runs at:
http://127.0.0.1:9411
./scripts/start_demo_services.shThis starts the local microservice chain:
| Service | Port | Endpoint |
|---|---|---|
| Checkout | 9001 | /checkout |
| Inventory | 9002 | /inventory |
| Payment | 9003 | /pay |
./scripts/demo.shhttp://127.0.0.1:9411
./scripts/stop.shWSL note: if the UI does not open in a Windows browser, start the collector on
0.0.0.0:9411and open the WSL IP fromhostname -I.
Running the demo generates five checkout requests:
checkout: {"ok":true,"trace_id":"676cfc0fc7a4127f206a40777d6588ea"}
checkout: {"ok":true,"trace_id":"70c4f52c953963736677797ce5db8784"}
checkout: {"ok":true,"trace_id":"246010c690b20538bef977e8c9bb1093"}
checkout: {"ok":true,"trace_id":"411792375af91a1c52ad386d9fcb441b"}
checkout: {"ok":true,"trace_id":"e24dc792d97371158d3eedab370f3622"}
Collector stats from a local run:
{
"trace_count": 5,
"span_count": 15,
"service_count": 3,
"services": ["checkout", "inventory", "payment"],
"p50_ms": 98.09,
"p95_ms": 99.80,
"p99_ms": 99.80
}curl -X POST http://127.0.0.1:9411/api/spans \
-H "Content-Type: application/json" \
-d '{
"spans": [
{
"trace_id": "trace-1",
"span_id": "span-1",
"service": "checkout",
"operation": "checkout.request",
"start_unix_ms": 1710000000000,
"duration_ms": 42,
"status": "ok"
}
]
}'curl http://127.0.0.1:9411/api/tracescurl http://127.0.0.1:9411/api/traces/<trace_id>curl http://127.0.0.1:9411/api/statscurl http://127.0.0.1:9411/metricsTraceFlow includes a tail-sampling policy that retains traces based on:
- error spans
- slow trace duration
- deterministic sampling rate
The local demo keeps traces by default so the UI remains easy to inspect, while the sampler package models production-style retention logic.
TraceFlow can mirror ingested spans into ClickHouse using the TRACEFLOW_CLICKHOUSE_URL environment variable.
TRACEFLOW_CLICKHOUSE_URL=http://127.0.0.1:8123 ./run/traceflow-collector -addr :9411The local UI continues to use the in-memory trace store for fast demo queries, while ClickHouse can be used as a durable analytics sink.
Start the stack:
docker compose up --buildGenerate traffic:
curl http://127.0.0.1:9001/checkoutOpen the collector UI:
http://127.0.0.1:9411
Run a concurrent traffic benchmark:
./scripts/start_collector.sh
./scripts/start_demo_services.sh
./scripts/bench.sh 100 20
./scripts/stop.shThe benchmark sends concurrent checkout requests. Each request triggers spans across checkout, inventory, and payment.
TraceFlow validates core behavior through:
- traceparent propagation tests
- in-memory trace aggregation tests
- collector ingestion tests
- tail-sampling policy tests
- demo-service smoke workflow in CI
cmd/collector/ Collector entry point
cmd/demo-service/ Checkout, inventory, and payment demo service binary
internal/collector/ HTTP API, metrics, and embedded UI serving
internal/store/ In-memory trace store and optional ClickHouse sink
internal/sampler/ Tail-sampling policy
internal/model/ Span, trace, and API models
pkg/traceflow/ Go tracing SDK and traceparent propagation helpers
scripts/ Local startup, demo, benchmark, and shutdown scripts
web/ Readable copy of the browser UI
.github/workflows/ CI workflow
TraceFlow focuses on core observability mechanics and local microservice tracing. It currently does not include:
- full OTLP/gRPC compatibility
- distributed collector sharding
- production retention management
- authentication and tenant isolation
- large-scale ClickHouse query optimization
- Grafana dashboard integration
- Add OTLP/gRPC ingestion compatibility
- Add configurable tail-sampling rules
- Add ClickHouse-backed trace query mode
- Add service dependency graph aggregation
- Add p50/p95/p99 breakdown by service and operation
- Add Grafana dashboard templates
- Add Kubernetes deployment manifests
MIT License
