This guide takes you from zero to a fully running observability layer in about 15 minutes.
| Requirement | Notes |
|---|---|
| Docker + Docker Compose v2 | docker compose version should show v2.x |
| Python 3.10+ | Only needed if you want the CLI or dev mode |
| A supported warehouse | PostgreSQL, MySQL, Snowflake, BigQuery, Redshift, DuckDB, Databricks, or Trino |
git clone https://github.com/willowvibe/ObservaKit.git
cd ObservaKit
pip install -e .
observakit initThe interactive wizard will help you:
- Create your
.envfile with warehouse credentials - Select your warehouse type (Postgres, BigQuery, Snowflake, etc.)
- Configure your first alert channel (Slack, Teams, PagerDuty)
- Verify your connection immediately
Open .env and set at minimum:
# The warehouse you want to observe
WAREHOUSE_TYPE=postgres
WAREHOUSE_HOST=your-db-host
WAREHOUSE_PORT=5432
WAREHOUSE_USER=your-user
WAREHOUSE_PASSWORD=your-password
WAREHOUSE_DB=your-database
# A random string to protect the API
OBSERVAKIT_API_KEY=change-me-to-a-long-random-string
# Where to send alerts (optional for first run)
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/URLNote: ObservaKit connects to your warehouse read-only for most operations (freshness, volume, schema checks). It only needs a user with
SELECToninformation_schemaand your monitored tables.
Before starting the containers, ensure your kit.yml is syntactically correct:
observakit validate-config
# → ✅ Configuration valid (found 12 table monitors)Lite mode (recommended first run — Backend + Metadata DB only):
docker compose -f docker-compose.lite.yml up -dFull stack (adds Prometheus metrics + Grafana dashboards):
docker compose up -dCheck it's running:
curl http://localhost:8000/healthz
# → {"status": "ok", "database": "ok", "version": "0.1.10"}This generates 7 days of simulated history with injected anomalies so you can see the UI immediately:
make demoThen open:
- Dashboard: http://localhost:8000/ui
- API: http://localhost:8000/docs
You'll see a health grid with freshness violations, volume anomalies, and schema drift — all simulated.
Edit config/kit.yml. At minimum, add your table to the freshness and volume monitors:
warehouse:
type: postgres # or mysql | snowflake | bigquery | redshift
config_file: config/warehouses/postgres.yml
freshness:
enabled: true
tables:
- table: public.orders # use schema.table format
timestamp_column: updated_at
warn_after: 1h
fail_after: 2h
alert: slack
volume:
enabled: true
tables:
- table: public.orders
anomaly_threshold: 0.3 # alert if row count deviates >30% from 7-day avg
alert: slackThe freshness and volume monitors run on a schedule (every 15 and 60 minutes by default). They also run immediately when you call
POST /freshness/pollorPOST /checks/volume.
Copy a template and edit it:
cp checks/templates/soda/no_nulls_on_pk.yml checks/my_project/orders.ymlEdit checks/my_project/orders.yml:
checks for public.orders:
- row_count > 0
- missing_count(order_id) = 0 # no null PKs
- duplicate_count(order_id) = 0 # PK must be unique
- min(amount) >= 0 # no negative revenue
- invalid_count(status) = 0:
valid values: [pending, confirmed, shipped, delivered, cancelled]Trigger manually:
curl -X POST http://localhost:8000/checks/run \
-H "X-API-Key: $OBSERVAKIT_API_KEY"Add your tables to schema monitoring in kit.yml:
schema_drift:
enabled: true
tables:
- public.orders
- public.customers
- public.paymentsTake the first snapshot:
curl -X POST http://localhost:8000/schema/snapshot \
-H "X-API-Key: $OBSERVAKIT_API_KEY"From now on, every snapshot run compares against the previous one. If a column is added, removed, or its type changes, you get an alert.
Add to kit.yml:
distribution:
enabled: true
tables:
- table: public.orders
drift_threshold: 0.10
columns:
- name: status
type: categorical
- name: amount
type: numericThis catches the "silent killer" scenario: the schema looks fine, the row count looks fine, but the distribution of values has shifted.
Copy the example contract:
cp config/contracts/example_orders.yml config/contracts/orders_v1.yml
# Edit to match your actual table and rulesEnable contracts in kit.yml:
contracts:
enabled: true
contracts_dir: config/contracts/Validate manually:
curl -X POST http://localhost:8000/contracts/validate \
-H "X-API-Key: $OBSERVAKIT_API_KEY"If your team uses dbt, ObservaKit can parse run_results.json and manifest.json directly — no dbt packages required:
dbt:
enabled: true
project_dir: /path/to/your/dbt/project
auto_parse_on_run: true
poll_interval_minutes: 5dbt test results are stored as CheckResult records. dbt model runs are stored as PipelineRun records. You get a unified view of both warehouse-level and dbt-level quality in the same dashboard.
pip install -e .
observakit statusTable Freshness Volume Quality Schema
───────────────── ────────── ─────── ──────── ──────
public.orders ok ok warn ok
public.customers ok ok ok fail
- Adding Checks — Soda Core, Great Expectations, and custom SQL
- Alert Setup — Slack, Email, Discord, generic webhook
- Data Contracts — Enforcing producer-consumer agreements
- Real-World Use Cases — How teams use ObservaKit in production
- Troubleshooting — Common issues and fixes