Skip to content

souvikDevloper/TraceFlow

Repository files navigation

TraceFlow

Go Tracing Status License

TraceFlow is an OpenTelemetry-style observability platform built in Go. It provides trace-context propagation, span ingestion, trace aggregation, service-level latency breakdowns, and a browser waterfall UI for debugging request latency across microservices.

The repository includes a complete local demo: a checkout service calls inventory and payment, trace context is propagated across HTTP boundaries, spans are sent to the collector, and the UI reconstructs the full request path.


Preview

TraceFlow UI


Highlights

  • W3C-style traceparent propagation
  • Go tracing SDK for span creation, injection, extraction, and reporting
  • Collector API for span ingestion and trace aggregation
  • Parent-child span linking across service boundaries
  • Browser waterfall UI for trace latency visualization
  • Service dependency path for each trace
  • Prometheus-style collector metrics endpoint
  • Tail-sampling policy for error and slow-trace retention
  • Optional ClickHouse write-through sink
  • Demo microservices: checkout → inventory → payment
  • Docker Compose local deployment
  • GitHub Actions CI workflow

Architecture

TraceFlow Architecture

Each request receives a trace ID. The checkout service creates the root span and propagates W3C-style traceparent headers to downstream services. Inventory and payment create child spans from the incoming trace context and report them to the collector. The collector groups spans by trace ID and reconstructs the full request timeline.


Trace Lifecycle

Trace Lifecycle

A checkout request produces one distributed trace:

checkout.request
├── inventory.lookup
└── payment.charge

The trace ID is preserved across services, while each span records its own service name, operation name, duration, status, span ID, and parent span ID.


UI Overview

TraceFlow UI Overview

The dashboard shows:

  • collector health
  • trace count, span count, service count, and p95 latency
  • recent traces
  • service dependency chain
  • per-span latency bars
  • end-to-end request duration

Tech Stack

Area Technology
Collector Go, net/http
SDK Go trace-context propagation package
Demo Services Go HTTP microservices
Storage In-memory trace store, optional ClickHouse sink
UI Embedded browser UI with waterfall rendering
Metrics Prometheus-style text endpoint
Orchestration Bash scripts, Docker Compose
CI/CD GitHub Actions

Getting Started

1. Run tests

go test ./...

2. Start the collector

./scripts/start_collector.sh

The collector runs at:

http://127.0.0.1:9411

3. Start demo services

./scripts/start_demo_services.sh

This starts the local microservice chain:

Service Port Endpoint
Checkout 9001 /checkout
Inventory 9002 /inventory
Payment 9003 /pay

4. Generate traces

./scripts/demo.sh

5. Open the UI

http://127.0.0.1:9411

6. Stop everything

./scripts/stop.sh

WSL note: if the UI does not open in a Windows browser, start the collector on 0.0.0.0:9411 and open the WSL IP from hostname -I.


Demo Output

Running the demo generates five checkout requests:

checkout: {"ok":true,"trace_id":"676cfc0fc7a4127f206a40777d6588ea"}
checkout: {"ok":true,"trace_id":"70c4f52c953963736677797ce5db8784"}
checkout: {"ok":true,"trace_id":"246010c690b20538bef977e8c9bb1093"}
checkout: {"ok":true,"trace_id":"411792375af91a1c52ad386d9fcb441b"}
checkout: {"ok":true,"trace_id":"e24dc792d97371158d3eedab370f3622"}

Collector stats from a local run:

{
  "trace_count": 5,
  "span_count": 15,
  "service_count": 3,
  "services": ["checkout", "inventory", "payment"],
  "p50_ms": 98.09,
  "p95_ms": 99.80,
  "p99_ms": 99.80
}

API Reference

Ingest spans

curl -X POST http://127.0.0.1:9411/api/spans \
  -H "Content-Type: application/json" \
  -d '{
    "spans": [
      {
        "trace_id": "trace-1",
        "span_id": "span-1",
        "service": "checkout",
        "operation": "checkout.request",
        "start_unix_ms": 1710000000000,
        "duration_ms": 42,
        "status": "ok"
      }
    ]
  }'

List traces

curl http://127.0.0.1:9411/api/traces

Fetch one trace

curl http://127.0.0.1:9411/api/traces/<trace_id>

Collector stats

curl http://127.0.0.1:9411/api/stats

Metrics

curl http://127.0.0.1:9411/metrics

Tail Sampling

TraceFlow includes a tail-sampling policy that retains traces based on:

  • error spans
  • slow trace duration
  • deterministic sampling rate

The local demo keeps traces by default so the UI remains easy to inspect, while the sampler package models production-style retention logic.


Optional ClickHouse Sink

TraceFlow can mirror ingested spans into ClickHouse using the TRACEFLOW_CLICKHOUSE_URL environment variable.

TRACEFLOW_CLICKHOUSE_URL=http://127.0.0.1:8123 ./run/traceflow-collector -addr :9411

The local UI continues to use the in-memory trace store for fast demo queries, while ClickHouse can be used as a durable analytics sink.


Docker Compose

Start the stack:

docker compose up --build

Generate traffic:

curl http://127.0.0.1:9001/checkout

Open the collector UI:

http://127.0.0.1:9411

Benchmarking

Run a concurrent traffic benchmark:

./scripts/start_collector.sh
./scripts/start_demo_services.sh
./scripts/bench.sh 100 20
./scripts/stop.sh

The benchmark sends concurrent checkout requests. Each request triggers spans across checkout, inventory, and payment.


Reliability Checks

TraceFlow validates core behavior through:

  • traceparent propagation tests
  • in-memory trace aggregation tests
  • collector ingestion tests
  • tail-sampling policy tests
  • demo-service smoke workflow in CI

Repository Structure

cmd/collector/          Collector entry point
cmd/demo-service/       Checkout, inventory, and payment demo service binary
internal/collector/     HTTP API, metrics, and embedded UI serving
internal/store/         In-memory trace store and optional ClickHouse sink
internal/sampler/       Tail-sampling policy
internal/model/         Span, trace, and API models
pkg/traceflow/          Go tracing SDK and traceparent propagation helpers
scripts/                Local startup, demo, benchmark, and shutdown scripts
web/                    Readable copy of the browser UI
.github/workflows/      CI workflow

Current Scope

TraceFlow focuses on core observability mechanics and local microservice tracing. It currently does not include:

  • full OTLP/gRPC compatibility
  • distributed collector sharding
  • production retention management
  • authentication and tenant isolation
  • large-scale ClickHouse query optimization
  • Grafana dashboard integration

Roadmap

  • Add OTLP/gRPC ingestion compatibility
  • Add configurable tail-sampling rules
  • Add ClickHouse-backed trace query mode
  • Add service dependency graph aggregation
  • Add p50/p95/p99 breakdown by service and operation
  • Add Grafana dashboard templates
  • Add Kubernetes deployment manifests

License

MIT License

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors