Skip to content

Commit e91cf37

Browse files
committed
Initial commit: sqlmesh-openlineage package
- OpenLineage integration for SQLMesh via set_console() API - Table-level and column-level lineage - Schema capture and execution stats - Per-model START/COMPLETE/FAIL events - Full test suite including Marquez integration tests - CI with GitHub Actions using uv
0 parents  commit e91cf37

18 files changed

Lines changed: 3898 additions & 0 deletions

.github/workflows/ci.yml

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
name: CI
2+
3+
on:
4+
push:
5+
branches: [main]
6+
pull_request:
7+
branches: [main]
8+
9+
jobs:
10+
test:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- uses: actions/checkout@v4
14+
15+
- name: Install uv
16+
uses: astral-sh/setup-uv@v4
17+
with:
18+
version: "latest"
19+
20+
- name: Set up Python
21+
run: uv python install 3.12
22+
23+
- name: Install dependencies
24+
run: uv sync --dev
25+
26+
- name: Run unit tests
27+
run: uv run pytest tests/test_console.py tests/test_datasets.py -v
28+
29+
integration-test:
30+
runs-on: ubuntu-latest
31+
services:
32+
postgres:
33+
image: postgres:14
34+
env:
35+
POSTGRES_USER: marquez
36+
POSTGRES_PASSWORD: marquez
37+
POSTGRES_DB: marquez
38+
options: >-
39+
--health-cmd pg_isready
40+
--health-interval 10s
41+
--health-timeout 5s
42+
--health-retries 5
43+
ports:
44+
- 5432:5432
45+
46+
marquez:
47+
image: marquezproject/marquez:latest
48+
env:
49+
MARQUEZ_CONFIG: /usr/src/app/marquez.dev.yml
50+
POSTGRES_HOST: postgres
51+
POSTGRES_PORT: 5432
52+
POSTGRES_DB: marquez
53+
POSTGRES_USER: marquez
54+
POSTGRES_PASSWORD: marquez
55+
ports:
56+
- 5001:5000
57+
options: >-
58+
--health-cmd "curl -f http://localhost:5000/api/v1/namespaces || exit 1"
59+
--health-interval 10s
60+
--health-timeout 5s
61+
--health-retries 10
62+
--health-start-period 30s
63+
64+
steps:
65+
- uses: actions/checkout@v4
66+
67+
- name: Install uv
68+
uses: astral-sh/setup-uv@v4
69+
with:
70+
version: "latest"
71+
72+
- name: Set up Python
73+
run: uv python install 3.12
74+
75+
- name: Install dependencies
76+
run: uv sync --dev
77+
78+
- name: Wait for Marquez
79+
run: |
80+
for i in {1..30}; do
81+
curl -s http://localhost:5001/api/v1/namespaces && break
82+
echo "Waiting for Marquez..."
83+
sleep 2
84+
done
85+
86+
- name: Run integration tests
87+
run: uv run pytest tests/test_integration.py tests/test_marquez_integration.py -v -s

.gitignore

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Python
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
*.so
6+
.Python
7+
build/
8+
develop-eggs/
9+
dist/
10+
downloads/
11+
eggs/
12+
.eggs/
13+
lib/
14+
lib64/
15+
parts/
16+
sdist/
17+
var/
18+
wheels/
19+
*.egg-info/
20+
.installed.cfg
21+
*.egg
22+
23+
# Virtual environments
24+
.venv/
25+
venv/
26+
ENV/
27+
28+
# IDE
29+
.idea/
30+
.vscode/
31+
*.swp
32+
*.swo
33+
34+
# Testing
35+
.pytest_cache/
36+
.coverage
37+
htmlcov/
38+
.tox/
39+
.nox/
40+
41+
# mypy
42+
.mypy_cache/
43+
44+
# ruff
45+
.ruff_cache/
46+
47+
# Local files
48+
*.log
49+
.DS_Store
50+
test_integration.py

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2025 Sidequery
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# sqlmesh-openlineage
2+
3+
OpenLineage integration for SQLMesh. Emits per-model lineage events during SQLMesh runs without modifying SQLMesh itself.
4+
5+
## Features
6+
7+
- **Table-level lineage**: Track which models depend on which upstream models
8+
- **Column-level lineage**: Track which columns flow from source to destination
9+
- **Schema capture**: Column names and types for each model
10+
- **Execution stats**: Duration, rows processed, bytes processed
11+
- **Per-model events**: START/COMPLETE/FAIL events for each model evaluation
12+
13+
## Installation
14+
15+
```bash
16+
pip install sqlmesh-openlineage
17+
```
18+
19+
Or with uv:
20+
21+
```bash
22+
uv add sqlmesh-openlineage
23+
```
24+
25+
## Quick Start (CLI Users)
26+
27+
Add this to your `config.py`:
28+
29+
```python
30+
import sqlmesh_openlineage
31+
32+
sqlmesh_openlineage.install(
33+
url="http://localhost:5000",
34+
namespace="my_project",
35+
# api_key="...", # optional
36+
)
37+
38+
from sqlmesh.core.config import Config
39+
40+
config = Config(
41+
# ... your existing config
42+
)
43+
```
44+
45+
Then run `sqlmesh run` as normal. OpenLineage events will be emitted for each model evaluation.
46+
47+
## Environment Variables
48+
49+
You can also configure via environment variables:
50+
51+
```bash
52+
export OPENLINEAGE_URL=http://localhost:5000
53+
export OPENLINEAGE_NAMESPACE=my_project
54+
export OPENLINEAGE_API_KEY=... # optional
55+
```
56+
57+
Then in `config.py`:
58+
59+
```python
60+
import sqlmesh_openlineage
61+
sqlmesh_openlineage.install() # reads from env vars
62+
```
63+
64+
## How It Works
65+
66+
This package uses SQLMesh's `set_console()` API to inject a custom Console wrapper. The wrapper intercepts per-snapshot lifecycle events and emits corresponding OpenLineage events:
67+
68+
- `START` event when a model evaluation begins
69+
- `COMPLETE` event when evaluation succeeds (includes execution stats)
70+
- `FAIL` event when evaluation fails or audits fail
71+
72+
## Events Emitted
73+
74+
| SQLMesh Event | OpenLineage Event | Data Included |
75+
|---------------|-------------------|---------------|
76+
| Model evaluation start | RunEvent(START) | Input datasets, output dataset with schema, column lineage |
77+
| Model evaluation success | RunEvent(COMPLETE) | Execution stats (rows, bytes, duration) |
78+
| Model evaluation failure | RunEvent(FAIL) | Error message |
79+
| Audit failure | RunEvent(FAIL) | Audit failure details |
80+
81+
## Column-Level Lineage
82+
83+
The integration automatically extracts column-level lineage using SQLMesh's built-in lineage analysis. For example, if you have:
84+
85+
```sql
86+
-- customers.sql
87+
SELECT customer_id, name, email FROM raw_customers
88+
89+
-- customer_summary.sql
90+
SELECT
91+
c.customer_id,
92+
c.name as customer_name,
93+
COUNT(o.order_id) as total_orders
94+
FROM customers c
95+
LEFT JOIN orders o ON c.customer_id = o.customer_id
96+
GROUP BY c.customer_id, c.name
97+
```
98+
99+
The lineage will show that `customer_summary.customer_name` traces back to `customers.name`.
100+
101+
## Testing with Marquez
102+
103+
```bash
104+
# Start Marquez (requires Docker)
105+
docker compose up -d
106+
107+
# Configure and run SQLMesh
108+
export OPENLINEAGE_URL=http://localhost:5001
109+
sqlmesh run
110+
111+
# View lineage at http://localhost:3000
112+
```
113+
114+
## Development
115+
116+
```bash
117+
# Install dependencies
118+
uv sync --dev
119+
120+
# Run tests (unit + integration)
121+
uv run pytest tests/ -v
122+
123+
# Run Marquez integration test (requires Docker)
124+
docker compose up -d
125+
uv run pytest tests/test_marquez_integration.py -v -s
126+
docker compose down
127+
```
128+
129+
## License
130+
131+
MIT

docker-compose.yml

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
services:
2+
marquez:
3+
image: marquezproject/marquez:latest
4+
platform: linux/amd64
5+
ports:
6+
- "5001:5000"
7+
- "5002:5001"
8+
depends_on:
9+
db:
10+
condition: service_healthy
11+
environment:
12+
- MARQUEZ_CONFIG=/usr/src/app/marquez.dev.yml
13+
- POSTGRES_HOST=db
14+
- POSTGRES_PORT=5432
15+
- POSTGRES_DB=marquez
16+
- POSTGRES_USER=marquez
17+
- POSTGRES_PASSWORD=marquez
18+
19+
db:
20+
image: postgres:14
21+
environment:
22+
- POSTGRES_USER=marquez
23+
- POSTGRES_PASSWORD=marquez
24+
- POSTGRES_DB=marquez
25+
healthcheck:
26+
test: ["CMD-SHELL", "pg_isready -U marquez"]
27+
interval: 5s
28+
timeout: 5s
29+
retries: 5

pyproject.toml

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
[project]
2+
name = "sqlmesh-openlineage"
3+
version = "0.1.0"
4+
description = "OpenLineage integration for SQLMesh"
5+
readme = "README.md"
6+
requires-python = ">= 3.9"
7+
license = { text = "MIT" }
8+
authors = [{ name = "Sidequery" }]
9+
dependencies = [
10+
"sqlmesh>=0.100.0",
11+
"openlineage-python>=1.0.0",
12+
]
13+
14+
[project.optional-dependencies]
15+
dev = [
16+
"pytest",
17+
"pytest-mock",
18+
"requests",
19+
"mypy",
20+
"ruff",
21+
]
22+
23+
[build-system]
24+
requires = ["hatchling"]
25+
build-backend = "hatchling.build"
26+
27+
[tool.hatch.build.targets.wheel]
28+
packages = ["src/sqlmesh_openlineage"]
29+
30+
[tool.ruff]
31+
line-length = 100
32+
33+
[tool.pytest.ini_options]
34+
testpaths = ["tests"]
35+
python_files = ["test_*.py"]
36+
python_functions = ["test_*"]
37+
38+
[tool.mypy]
39+
python_version = "3.9"
40+
strict = true

0 commit comments

Comments
 (0)