Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 20 additions & 4 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,11 +69,19 @@ jobs:
GH_TOKEN: ${{ github.token }}

integration:
name: SQL end-to-end (DuckDB sqllogictest)
# SQL end-to-end across every transport the vgi extension supports. The SAME
# test/sql/*.test suite runs three ways — the only difference is the LOCATION
# the .test files ATTACH (ci/run-integration.sh sets VGI_MASK_WORKER per
# transport): subprocess = the stdio worker binary DuckDB spawns; http =
# `mask-worker --http` (auto port, advertises PORT:<n>); unix = `mask-worker
# --unix <sock>` (advertises UNIX:<sock>).
name: SQL E2E (${{ matrix.transport }}) · ${{ matrix.os }}
needs: resolve-haybarn
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest]
transport: [subprocess, http, unix]
include:
- { os: ubuntu-latest, asset: haybarn_unittest-linux-amd64.zip }
- { os: macos-latest, asset: haybarn_unittest-osx-arm64.zip }
Expand All @@ -90,6 +98,10 @@ jobs:
uses: mozilla-actions/sccache-action@v0.0.9

- name: Build worker (release)
# The `http` transport needs the `vgi-rpc` `http` feature; the workspace
# already pins `vgi-rpc = { features = ["macros", "http"] }` (and the SDK
# `vgi` crate enables HTTP via that), so a plain release build serves all
# three transports — no extra cargo features required here.
run: cargo build --release --bin mask-worker

- name: Download haybarn-unittest
Expand All @@ -106,11 +118,15 @@ jobs:
- name: Resolve runner + worker paths
run: |
# Absolute paths: run-integration.sh cd's into a staging dir before
# invoking the runner, so relative paths would not resolve.
# invoking the runner, so relative paths would not resolve. WORKER_BIN
# is the compiled binary — used directly as the subprocess LOCATION and
# launched in --http / --unix mode for those transports.
UNITTEST="$PWD/$(find hb -name 'haybarn-unittest' -type f | head -1)"
chmod +x "$UNITTEST"
echo "HAYBARN_UNITTEST=$UNITTEST" >> "$GITHUB_ENV"
echo "VGI_MASK_WORKER=$PWD/target/release/mask-worker" >> "$GITHUB_ENV"
echo "WORKER_BIN=$PWD/target/release/mask-worker" >> "$GITHUB_ENV"

- name: Run extension integration suite
- name: Run extension integration suite (${{ matrix.transport }})
run: ci/run-integration.sh
env:
TRANSPORT: ${{ matrix.transport }}
72 changes: 63 additions & 9 deletions ci/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,46 @@ the Rust unit + integration tests, and this repo's sqllogictest suite
(`test/sql/*.test`) against the vgi-mask VGI worker through the **real DuckDB
`vgi` extension** on every push / PR.

## Transport matrix

The integration suite runs over **every transport the vgi extension supports**.
The exact same `test/sql/*.test` files run three ways; the only thing that
changes is what LOCATION the `.test` files `ATTACH` (set by
[`run-integration.sh`](run-integration.sh) from the `TRANSPORT` env var):

| `TRANSPORT` | `VGI_MASK_WORKER` (the ATTACH LOCATION) | how the worker is launched |
|--------------|------------------------------------------|----------------------------|
| `subprocess` | `…/target/release/mask-worker` | DuckDB spawns the stdio binary (default) |
| `http` | `http://127.0.0.1:<port>` | `mask-worker --http` (auto port; prints `PORT:<n>` on stdout, which the script polls for) |
| `unix` | `unix:///tmp/mask.<pid>.sock` | `mask-worker --unix <sock>` (prints `UNIX:<sock>` on stdout + creates the socket; the script waits for both) |

CI runs `transport: [subprocess, http, unix]` × `os: [ubuntu, macos]` as a
matrix. Build the worker once with a plain `cargo build --release` — the
workspace already pins `vgi-rpc = { features = ["macros", "http"] }`, so the one
binary serves all three transports; **no extra cargo feature is needed**.

### The `http` leg needs DuckDB's `httpfs` extension

The vgi extension's **HTTP client** is built on DuckDB's `httpfs`. Over `http://`,
`ATTACH` fails without it:

> `Binder Error: VGI HTTP transport requires the httpfs extension. Install it with: INSTALL httpfs; LOAD httpfs;`

Crucially, that message contains the substring **`HTTP`**, and DuckDB's
sqllogictest runner ships a default `ignore_error_messages = {"HTTP", "Unable to
connect"}` that **silently SKIPs** any test whose error matches — so a missing
`httpfs` looks like a (deceptive) pass-by-skip, not a failure. We handle this in
two places:

1. [`preprocess-require.awk`](preprocess-require.awk), invoked with
`-v transport=http`, injects a signed `INSTALL httpfs FROM core; LOAD httpfs;`
right after each `LOAD vgi;` so the http leg actually loads the client and runs.
2. [`run-integration.sh`](run-integration.sh) fails the job if the runner reports
*any* skipped tests (a skip is never a pass) — guarding against this and any
future silent-skip masking.

The `unix` (AF_UNIX launcher) leg needs no extra extension.

## How it works (no C++ build)

Rather than building the vgi DuckDB extension from source, the integration job
Expand All @@ -21,23 +61,37 @@ sqllogictest runner, published in Haybarn's releases) and installs the
tests gate on, so [`preprocess-require.awk`](preprocess-require.awk) rewrites
each `require <ext>` into an explicit signed `INSTALL <ext> FROM
{community,core}; LOAD <ext>;`. `require-env` and everything else pass
through untouched.
4. **Run** — [`run-integration.sh`](run-integration.sh) stages the preprocessed
tree, points `VGI_MASK_WORKER` at the release binary, warms the extension
cache once (`INSTALL vgi FROM community;` — this is what makes the tests'
explicit `LOAD vgi;` succeed), then runs the suite in a single
`haybarn-unittest` invocation. Any failed assertion exits non-zero and fails
the job.
through untouched. (The vgi-mask `.test` files already `LOAD vgi;`
explicitly and use `require-env VGI_MASK_WORKER`, so the `require`-rewrite is
a no-op here.) When run with `-v transport=http`, the awk also injects
`INSTALL httpfs FROM core; LOAD httpfs;` after each `LOAD vgi;` (see the
transport-matrix section above for why).
4. **Run** — [`run-integration.sh`](run-integration.sh) brings up the worker for
the selected `TRANSPORT` and sets `VGI_MASK_WORKER` accordingly, stages the
preprocessed tree, warms the extension cache once (`INSTALL vgi FROM
community;` — this is what makes the tests' explicit `LOAD vgi;` succeed),
then runs the suite in a single `haybarn-unittest` invocation. Any failed
assertion — or any skipped test — exits non-zero and fails the job.

## Run it locally

```bash
cargo build --release --bin mask-worker
# point HAYBARN_UNITTEST at a haybarn-unittest binary (or a local DuckDB
# `unittest` built with the vgi extension), and the worker at the release binary:
# `unittest` built with the vgi extension), and WORKER_BIN at the release binary.
# TRANSPORT defaults to subprocess; set it to http or unix for the other legs.
HAYBARN_UNITTEST=/path/to/haybarn-unittest \
VGI_MASK_WORKER="$PWD/target/release/mask-worker" \
WORKER_BIN="$PWD/target/release/mask-worker" \
TRANSPORT=subprocess \
ci/run-integration.sh

# HTTP leg: launches `mask-worker --http`, reads PORT:<n>, attaches http://…
HAYBARN_UNITTEST=/path/to/haybarn-unittest WORKER_BIN="$PWD/target/release/mask-worker" \
TRANSPORT=http ci/run-integration.sh

# Unix leg: launches `mask-worker --unix /tmp/mask.<pid>.sock`, attaches unix://…
HAYBARN_UNITTEST=/path/to/haybarn-unittest WORKER_BIN="$PWD/target/release/mask-worker" \
TRANSPORT=unix ci/run-integration.sh
```

Or use the Makefile target `make test-sql`, which builds the release worker and
Expand Down
29 changes: 28 additions & 1 deletion ci/preprocess-require.awk
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,36 @@
# comes from the signed community channel; httpfs/json/parquet/spatial from the
# signed core channel. `require-env` and every other directive pass through
# untouched. See ci/README.md.
#
# When invoked with `-v transport=http`, the http:// LOCATION exercised by that
# transport requires DuckDB's `httpfs` extension (the vgi extension's HTTP
# client is built on it; without it ATTACH fails with a Binder Error whose text
# contains "HTTP" — which DuckDB's sqllogictest runner silently *auto-skips*
# via its default `ignore_error_messages = {"HTTP", ...}`, so a missing httpfs
# looks like a pass-by-skip rather than a failure). We therefore inject a signed
# `INSTALL httpfs FROM core; LOAD httpfs;` right after the test's own
# `LOAD vgi;` so the http leg actually runs (and is not silently skipped).
# Harmless/absent for subprocess and unix (transport != http).
/^require[ \t]+vgi[ \t]*$/ {
print "statement ok"; print "INSTALL vgi FROM community;"; print "";
print "statement ok"; print "LOAD vgi;"; next
print "statement ok"; print "LOAD vgi;";
if (transport == "http") {
print "";
print "statement ok"; print "INSTALL httpfs FROM core;"; print "";
print "statement ok"; print "LOAD httpfs;";
}
next
}
# The .test files LOAD vgi explicitly (statement ok + LOAD vgi;) rather than via
# `require vgi`; hook that line too so the http leg gets httpfs.
/^LOAD[ \t]+vgi[ \t]*;[ \t]*$/ {
print
if (transport == "http") {
print "";
print "statement ok"; print "INSTALL httpfs FROM core;"; print "";
print "statement ok"; print "LOAD httpfs;";
}
next
}
/^require[ \t]+(httpfs|json|parquet|spatial)[ \t]*$/ {
ext = $2
Expand Down
126 changes: 120 additions & 6 deletions ci/run-integration.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,25 +5,122 @@
# VGI worker, using a prebuilt standalone `haybarn-unittest` and the signed
# community `vgi` extension — no C++ build from source. See ci/README.md.
#
# Parameterized by TRANSPORT (default: subprocess), exercising the SAME suite
# over each transport the vgi extension supports — the only thing that changes
# is what LOCATION (VGI_MASK_WORKER) the .test files ATTACH:
#
# subprocess VGI_MASK_WORKER = the stdio worker command (DuckDB spawns it).
# http start `mask-worker --http` (auto port; advertises `PORT:<n>`
# on stdout), VGI_MASK_WORKER = http://127.0.0.1:<port>.
# unix start `mask-worker --unix <sock>` (advertises `UNIX:<sock>`
# on stdout), VGI_MASK_WORKER = unix://<sock>.
#
# Required environment:
# HAYBARN_UNITTEST path to the haybarn-unittest binary
# VGI_MASK_WORKER worker LOCATION the .test files attach (a stdio command
# such as the compiled mask-worker binary, or an http:// URL)
# WORKER_BIN path to the compiled mask-worker binary (used to launch
# the http/unix servers, and the stdio LOCATION). Falls back
# to VGI_MASK_WORKER when that is a bare command (subprocess).
# Optional:
# TRANSPORT subprocess | http | unix (default: subprocess)
# STAGE scratch dir for the preprocessed test tree (default: mktemp)
set -euo pipefail

TRANSPORT="${TRANSPORT:-subprocess}"

: "${HAYBARN_UNITTEST:?path to the haybarn-unittest binary}"
: "${VGI_MASK_WORKER:?worker LOCATION (stdio command or http:// URL)}"

HERE="$(cd "$(dirname "$0")" && pwd)"
REPO="$(cd "$HERE/.." && pwd)"
STAGE="${STAGE:-$(mktemp -d)}"

# For http/unix we must launch the worker binary ourselves; for subprocess the
# binary IS the LOCATION. WORKER_BIN names the compiled binary; default to the
# release build in this repo.
WORKER_BIN="${WORKER_BIN:-$REPO/target/release/mask-worker}"

SERVER_PID=""
SOCK_PATH=""
cleanup() {
local rc=$?
if [[ -n "$SERVER_PID" ]]; then
kill "$SERVER_PID" 2>/dev/null || true
wait "$SERVER_PID" 2>/dev/null || true
fi
[[ -n "$SOCK_PATH" ]] && rm -f "$SOCK_PATH" 2>/dev/null || true
return "$rc"
}
trap cleanup EXIT

# Bring up the out-of-band server for http/unix and resolve VGI_MASK_WORKER.
# Both transports announce their endpoint on stdout (`PORT:<n>` / `UNIX:<path>`),
# which we poll for in the log before running the suite (readiness gate).
start_server_and_set_location() {
local kind="$1"
: "${WORKER_BIN:?path to the mask-worker binary (WORKER_BIN)}"
[[ -x "$WORKER_BIN" ]] || { echo "ERROR: worker binary not executable: $WORKER_BIN" >&2; exit 1; }

local log="$STAGE/.worker-$kind.log"
case "$kind" in
http)
"$WORKER_BIN" --http >"$log" 2>&1 &
SERVER_PID=$!
local port=""
for _ in $(seq 1 60); do
if ! kill -0 "$SERVER_PID" 2>/dev/null; then
echo "ERROR: worker (--http) exited during startup. Log:" >&2; cat "$log" >&2; exit 1
fi
port=$(sed -n 's/.*PORT:\([0-9][0-9]*\).*/\1/p' "$log" 2>/dev/null | head -1)
[[ -n "$port" ]] && break
sleep 0.5
done
[[ -n "$port" ]] || { echo "ERROR: timed out waiting for PORT:<n>. Log:" >&2; cat "$log" >&2; exit 1; }
export VGI_MASK_WORKER="http://127.0.0.1:$port"
echo "HTTP worker ready on 127.0.0.1:$port (pid $SERVER_PID)"
;;
unix)
SOCK_PATH="${VGI_MASK_SOCK:-/tmp/mask.$$.sock}"
rm -f "$SOCK_PATH" 2>/dev/null || true
"$WORKER_BIN" --unix "$SOCK_PATH" >"$log" 2>&1 &
SERVER_PID=$!
local ready=""
for _ in $(seq 1 60); do
if ! kill -0 "$SERVER_PID" 2>/dev/null; then
echo "ERROR: worker (--unix) exited during startup. Log:" >&2; cat "$log" >&2; exit 1
fi
if grep -q "UNIX:$SOCK_PATH" "$log" 2>/dev/null && [[ -S "$SOCK_PATH" ]]; then
ready=1; break
fi
sleep 0.5
done
[[ -n "$ready" ]] || { echo "ERROR: timed out waiting for UNIX socket. Log:" >&2; cat "$log" >&2; exit 1; }
export VGI_MASK_WORKER="unix://$SOCK_PATH"
echo "Unix worker ready on $SOCK_PATH (pid $SERVER_PID)"
;;
esac
}

case "$TRANSPORT" in
subprocess)
# The binary itself is the stdio LOCATION DuckDB spawns. Honor an explicit
# VGI_MASK_WORKER (e.g. a bare command) if the caller set one.
export VGI_MASK_WORKER="${VGI_MASK_WORKER:-$WORKER_BIN}"
;;
http) start_server_and_set_location http ;;
unix) start_server_and_set_location unix ;;
*) echo "ERROR: unknown TRANSPORT '$TRANSPORT' (want subprocess|http|unix)" >&2; exit 1 ;;
esac

: "${VGI_MASK_WORKER:?worker LOCATION (stdio command, http:// URL, or unix:// socket)}"

echo "Staging preprocessed tests into $STAGE ..."
mkdir -p "$STAGE/test/sql"
# Pass the transport to the preprocessor: the http leg additionally needs DuckDB's
# `httpfs` extension loaded (the vgi extension's HTTP client is built on it), so
# the awk injects a signed INSTALL/LOAD httpfs after each `LOAD vgi;`. Without it
# the http ATTACH fails with a "HTTP"-containing error that the sqllogictest
# runner *silently auto-skips* (default ignore_error_messages), masking the gap.
for f in "$REPO"/test/sql/*.test; do
awk -f "$HERE/preprocess-require.awk" "$f" > "$STAGE/test/sql/$(basename "$f")"
awk -v transport="$TRANSPORT" -f "$HERE/preprocess-require.awk" "$f" > "$STAGE/test/sql/$(basename "$f")"
done

cd "$STAGE"
Expand All @@ -45,5 +142,22 @@ rm -f "$STAGE/test/_warm.test"

# Run the whole suite in one invocation, streaming the runner's native
# sqllogictest report. Any failed assertion exits non-zero and fails the job.
echo "Running suite (worker: $VGI_MASK_WORKER) ..."
"$HAYBARN_UNITTEST" "test/sql/*"
#
# Guard against the silent-skip trap: DuckDB's sqllogictest runner auto-skips
# any test whose error message contains "HTTP" (default ignore_error_messages),
# so a broken http leg can report "All tests were skipped" with exit 0 and look
# green. Tee the report and fail if NOTHING actually ran. (For subprocess/unix
# there is no skip path, so this only ever bites a genuinely broken http leg.)
echo "Running suite (transport: $TRANSPORT, worker: $VGI_MASK_WORKER) ..."
REPORT="$STAGE/.report.txt"
set +e
"$HAYBARN_UNITTEST" "test/sql/*" 2>&1 | tee "$REPORT"
status="${PIPESTATUS[0]}"
set -e
if grep -qiE "All tests were skipped|total skipped [1-9]" "$REPORT"; then
echo "ERROR: tests were SKIPPED — almost certainly an ATTACH/transport error whose" >&2
echo " message matched the runner's default ignore list (e.g. \"HTTP\"). A skip" >&2
echo " is NOT a pass. Transport=$TRANSPORT worker=$VGI_MASK_WORKER." >&2
exit 1
fi
exit "$status"
Loading