Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 22 additions & 6 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -89,11 +89,22 @@ jobs:
GH_TOKEN: ${{ github.token }}

integration:
name: End-to-end SQL on ${{ matrix.os }}
name: SQL E2E (${{ matrix.transport }}) on ${{ matrix.os }}
needs: resolve-haybarn
strategy:
fail-fast: false
matrix:
# Run the SAME sqllogictest suite over every VGI transport. The vgi
# extension picks the transport from the ATTACH LOCATION string that
# run-integration.sh builds per $TRANSPORT. The in-process mock provider
# server (tests/mock_server.py) is started out of band and stays up for
# all three — the worker reads its exported VGI_SCHOLAR_*_BASE_URL vars
# however DuckDB reaches the worker:
# subprocess : `.venv/bin/python scholar_worker.py` (stdio)
# http : `http://127.0.0.1:<port>` (worker booted with --http)
# unix : `unix:///tmp/scholar-<pid>.sock` (worker booted --unix)
os: [ubuntu-latest, macos-latest]
transport: [subprocess, http, unix]
include:
- { os: ubuntu-latest, asset: haybarn_unittest-linux-amd64.zip }
- { os: macos-latest, asset: haybarn_unittest-osx-arm64.zip }
Expand All @@ -110,8 +121,10 @@ jobs:
- name: Set up Python 3.13
run: uv python install 3.13

- name: Install the worker (from the lockfile)
run: uv sync --frozen --python 3.13
- name: Install the worker (from the lockfile, with the http extra)
# The `http` extra pulls in waitress so the worker can serve `--http`.
# Harmless for the subprocess/unix legs; required for the http leg.
run: uv sync --frozen --python 3.13 --extra http

- name: Download haybarn-unittest
run: |
Expand All @@ -130,11 +143,14 @@ jobs:
# invoking the runner, so relative paths would not resolve. The worker
# runs from the synced .venv (deps resolved from PyPI via the lockfile);
# plain `.venv/bin/python` ignores the PEP 723 header. run-integration.sh
# starts the mock provider server and redirects the providers at it.
# starts the mock provider server and redirects the providers at it, and
# boots the worker itself for the http/unix legs.
UNITTEST="$PWD/$(find hb -name 'haybarn-unittest' -type f | head -1)"
chmod +x "$UNITTEST"
echo "HAYBARN_UNITTEST=$UNITTEST" >> "$GITHUB_ENV"
echo "VGI_SCHOLAR_WORKER=$PWD/.venv/bin/python $PWD/scholar_worker.py" >> "$GITHUB_ENV"
echo "WORKER_CMD=$PWD/.venv/bin/python $PWD/scholar_worker.py" >> "$GITHUB_ENV"

- name: Run extension integration suite
- name: Run extension integration suite (${{ matrix.transport }})
run: ci/run-integration.sh
env:
TRANSPORT: ${{ matrix.transport }}
131 changes: 108 additions & 23 deletions ci/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# CI: the vgi-calendar worker integration suite
# CI: the vgi-scholar worker integration suite

[`.github/workflows/ci.yml`](../.github/workflows/ci.yml) runs the unit tests
and this repo's sqllogictest suite (`test/sql/*.test`) against the vgi-calendar
and this repo's sqllogictest suite (`test/sql/*.test`) against the vgi-scholar
VGI worker through the **real DuckDB `vgi` extension** on every push / PR.

## How it works (no C++ build)
Expand All @@ -11,33 +11,118 @@ Rather than building the vgi DuckDB extension from source, CI drives a
runner, published in Haybarn's releases) and installs the **signed** `vgi`
extension from the Haybarn community channel:

1. **Install the worker** — `uv sync --frozen` into a venv. `calendar_worker.py`
is a self-contained PEP 723 stdio worker the extension can spawn via
`uv run calendar_worker.py`.
2. **Download the runner** — the matching `haybarn_unittest-*` asset per
platform from the latest Haybarn release.
3. **Preprocess** — the standalone runner links none of the extensions the
tests gate on, so [`preprocess-require.awk`](preprocess-require.awk) rewrites
1. **Install the worker** — `uv sync --frozen --extra http` into a venv.
`scholar_worker.py` is a self-contained PEP 723 stdio worker the extension
can spawn via `uv run scholar_worker.py`.
2. **Download the runner** — the matching `haybarn_unittest-*` asset per platform.
3. **Preprocess** — [`preprocess-require.awk`](preprocess-require.awk) rewrites
each `require <ext>` into an explicit signed `INSTALL <ext> FROM
{community,core}; LOAD <ext>;`. These tests skip `require vgi` (haybarn
silently SKIPs it) and `LOAD vgi;` directly, so the awk also injects an
`INSTALL vgi FROM community;` right before each bare `LOAD vgi;`. `require-env`
and everything else pass through untouched.
{community,core}; LOAD <ext>;`, and injects `INSTALL vgi FROM community;`
before each bare `LOAD vgi;` (these tests skip `require vgi`, which haybarn
silently SKIPs, and `LOAD vgi;` directly). `require-env` and everything else
pass through untouched.
4. **Run** — [`run-integration.sh`](run-integration.sh) stages the preprocessed
tree, points `VGI_CALENDAR_WORKER` at `uv run calendar_worker.py`, warms the
extension cache once, then runs the suite in a single `haybarn-unittest`
invocation. Any failed assertion exits non-zero and fails the job.
tree, starts the mock provider server, resolves `VGI_SCHOLAR_WORKER` (the
ATTACH `LOCATION`) per `$TRANSPORT`, warms the extension cache once, then runs
the suite in a single `haybarn-unittest` invocation. Any failed assertion
fails the job.

## The mock-driven worker (all transports)

The scholarly providers (OpenAlex / arXiv / Crossref) are redirected at a local
in-process canned-response mock HTTP server ([`tests/mock_server.py`](../tests/mock_server.py))
via the `VGI_SCHOLAR_<PROVIDER>_BASE_URL` env vars, so the authoritative SQL
suite drives the real worker end to end against deterministic, *paged* fixtures —
no keys, no cost, no live network egress.

`run-integration.sh` starts that mock server **once, out of band** (`python -m
tests.mock_server`, which prints `URL:<base>` and blocks), reads the URL, and
`export`s the three `VGI_SCHOLAR_*_BASE_URL` vars. Because they are exported,
**the same mock server serves every transport** — the worker reads them whether
DuckDB reaches it over stdio, HTTP, or an AF_UNIX socket, and for the
out-of-band legs the booted worker inherits them from the environment. The mock
server stays alive for the life of the run and is trap-killed on exit (a single
`cleanup()` kills both the mock server and, for http/unix, the worker).

## Transport matrix (subprocess | http | unix)

The same `test/sql/*.test` suite is run over all three VGI transports — the
extension picks the transport from the `LOCATION` string the `.test` files
`ATTACH`, and `run-integration.sh` builds that string from `$TRANSPORT`:

| `TRANSPORT` | `VGI_SCHOLAR_WORKER` (LOCATION) | How the worker is reached |
|--------------|---------------------------------------------|---------------------------|
| `subprocess` | `.venv/bin/python scholar_worker.py` | extension spawns the worker per query; Arrow IPC over stdin/stdout (default) |
| `http` | `http://127.0.0.1:<port>` | harness boots `scholar_worker.py --http --port 0 --port-file <f>`, waits for the port-file, then ATTACHes that URL |
| `unix` | `unix:///tmp/scholar-<pid>.sock` | harness boots `scholar_worker.py --unix <sock>`, waits for the socket, then ATTACHes it |

The CI `integration` job is a `transport: [subprocess, http, unix]` × `os:
[ubuntu-latest, macos-latest]` matrix; each leg runs `ci/run-integration.sh` with
`TRANSPORT=<t>`. Run a single transport locally with e.g.
`TRANSPORT=http ci/run-integration.sh`.

### Port / socket discovery

- **http**: the worker writes its auto-selected port to `--port-file` atomically,
so the harness watches for that file (not stdout). Boot line:
`scholar_worker.py --http --port 0 --port-file <f>`.
- **unix**: the worker binds the socket and prints `UNIX:<abs-path>`; the harness
polls for the socket file (`test -S`). Boot line:
`scholar_worker.py --unix <sock>`.

Both out-of-band server processes run with cwd = the repo root (so the worker
resolves the `vgi_scholar` package and inherits the exported `*_BASE_URL` vars).

### HTTP transport needs the `httpfs` extension (resolved, not gated)

The vgi extension implements HTTP transport on top of DuckDB's **httpfs**
extension, so an `http://` ATTACH binds with `VGI HTTP transport requires the
httpfs extension` unless httpfs is loaded first. This is a **dependency**, not a
protocol limitation, so we resolve it: the http leg injects a signed `INSTALL
httpfs FROM core; LOAD httpfs;` into each staged `.test` (after the awk-injected
`LOAD vgi;`). The leg also needs the worker's `http` extra (waitress) —
`pyproject.toml` ships an `http` extra (`vgi-python[http]`), the PEP 723 header
in `scholar_worker.py` lists `vgi-python[http]`, and CI runs `uv sync --frozen
--extra http`.

> **Sharp edge — the runner silently SKIPs HTTP errors.** The haybarn/DuckDB
> sqllogictest runner's default skip list skips any statement whose error
> contains `"HTTP"` or `"Unable to connect"`, so a broken http setup reports
> "All tests were skipped" — a green-looking **fake pass**.
> `run-integration.sh` fails the leg unless the runner reports `All tests passed
> (N assertions …)` with N > 0 and zero skips.

### `scholar_search` pagination over HTTP (externalized cursor — no gate)

`scholar_search` is a streaming/paging table function: it fetches one provider
page per `process()` tick and emits it across multiple ticks until `count` is
satisfied. Streaming table functions run fine over the **stateless** HTTP
transport **because the cursor is externalized**: the per-scan position lives in
a plain-serializable `_ScanState(ArrowSerializableDataclass)` (`cursor` /
`emitted` / `started` / `done`) that the framework round-trips through its
continuation token on every tick (and so across independent HTTP requests). The
mock returns one result per page, so `count := 5, page_size := 1` forces five
paged ticks; the http leg runs the **full** suite including that scan-state
round-trip (contiguous unique titles 0..4) — nothing is gated. (This is the same
"externalize the scan position into the serialized state" pattern as the vgi-cve
cursor fix.)

### Per-transport status

- **subprocess**: GREEN — 33 assertions.
- **http**: GREEN — 35 assertions (33 + the injected httpfs INSTALL/LOAD). Full
suite incl. the `scholar_search` paging round-trip.
- **unix**: GREEN — 33 assertions.

## Run it locally

```bash
uv sync --python 3.13 # install the worker + deps
# point HAYBARN_UNITTEST at a haybarn-unittest binary (or a local DuckDB
# `unittest` built with the vgi extension), and the worker at the stdio command:
uv sync --python 3.13 --extra http
HAYBARN_UNITTEST=/path/to/haybarn-unittest \
VGI_CALENDAR_WORKER="uv run --python 3.13 calendar_worker.py" \
ci/run-integration.sh
WORKER_CMD="$PWD/.venv/bin/python $PWD/scholar_worker.py" \
TRANSPORT=subprocess ci/run-integration.sh # or TRANSPORT=http / TRANSPORT=unix
```

Or use the Makefile target `make test-sql`, which installs `haybarn-unittest`
as a uv tool and points the worker at `uv run --python 3.13 calendar_worker.py`.
`TRANSPORT` defaults to `subprocess`, and `WORKER_CMD` defaults to
`uv run --python 3.13 <repo>/scholar_worker.py`. Or use the Makefile target
`make test-sql` (subprocess, via `scripts/run_sql_e2e.py`).
Loading
Loading