release: bring main up to develop (0.2.17 — release-readiness docs + eval pattern examples + transitive CVE patches) by constk · Pull Request #108 · constk/harness-python-react

constk · 2026-05-26T07:56:29Z

What ships in this release

PR	Commit on develop	Theme
#83	`ea6b8b1`	pin-freshness audit normalises sub-path actions before API call (carried over from prior session)
#103	`d256e32`	Security: transitive-dep CVE patches — `idna 3.13 → 3.16` (CVE-2026-45409), `starlette 1.0.0 → 1.1.0` (PYSEC-2026-161)
#104	`18b4d30`	Feature: eval pattern examples calling Azure OpenAI — 4 worked cases across the existing tolerance modes, new `src/eval/adapters/azure_openai.py` adapter, optional `[eval]` extra
#106	`eb0136e`	Chore: align develop with main (backport #86's Beads guidance + scaffold updates that landed directly on main on 2026-05-25)
#101	`722293d`	Docs: mark admin-merge policy as transitional solo-owner state
#99	`59ad7f0`	Docs: reframe README opener around the human+agent audience
#100	`7c84f18`	Docs: add concrete agent-failure example to "Why a harness"
#105	`8938eb7`	Docs: replace Jaeger screenshot TODO with section scaffold

Why a `release/0.2.17` branch (not direct `develop → main`)

#86 was merged directly to main on 2026-05-25, bypassing the standard develop → release flow. Develop later backported #86's content via #106 as a separate squash. The two paths give git no common ancestor on pyproject.toml / uv.lock, so the direct develop → main PR (#107, now closed) conflicted on the version line.

This PR is main + one merge commit pulling in develop with the conflict resolved (take develop's 0.2.17). All other files auto-merged cleanly. After this lands, main is a fast-forward of develop and the divergence is closed.

Version

0.2.11 → 0.2.17. Six PATCH bumps cascaded on develop as each in-flight PR rebased over the previous one — one bump per merge, as required by the version-bump gate.

Highlights

Open-source release readiness. Issues docs: reframe README opener around the human+agent audience #90, docs: add a concrete agent-failure example to make the harness claim tangible #91, docs: replace Jaeger screenshot TODO in README observability section #92, docs: mark admin-merge policy as transitional solo-owner state #93, test: strengthen eval slice — realistic cases or explicit scaffold framing #94 (the original release-blocker set) are all closed.
First real eval slice. Four worked-pattern cases across factual recall, numeric reasoning, definitional prose, and structured-output adherence — against a real Azure OpenAI deployment. Live cases gated on AZURE_OPENAI_* env vars; stock uv run pytest eval/ still exits 0.
CVE-clean. pip-audit returns "No known vulnerabilities found".
Admin-merge policy hardened. CONTRIBUTING.md explicitly frames the --admin workflow as transitional with a numbered exit checklist; enforce_admins: true flip is now required, not optional.

Test plan

Local merge of origin/develop into release/0.2.17 resolved with no other conflicts
uv run --frozen pytest tests/ -q → 215 passed on the merged tree
uv run --frozen mypy --strict src/ tests/ → clean on 44 source files
uv run --frozen ruff check . → All checks passed
uv run --frozen lint-imports → both contracts kept
CI on this PR (verify after open)

Invariants affected

None new.

New deps / actions / external surface

New optional Python extra: [eval] with openai>=1.40.0
New external endpoint: Azure OpenAI (per-deployment URL). Only called from eval/test_golden_patterns.py, only when AZURE_OPENAI_* env vars are set.
No new GitHub Actions; no new runtime deps in the default install.

Tagging note

Per .github/workflows/release.yml, the public release (GHCR image push, CycloneDX SBOM, GitHub Release page) is tag-triggered. Tag v0.2.17 against the merge commit after this PR lands to publish. Per CONTRIBUTING.md, the merge command for release: PRs is gh pr merge <N> --admin --merge --delete-branch (preserves history, no squash).

Linked issues

Closes none directly (all linked issues already closed on develop). This PR fans the closures out to main.

…83)

pip-audit on develop is flagging two transitive-dep CVEs: - idna 3.13 CVE-2026-45409 (fix in 3.15+) - starlette 1.0.0 PYSEC-2026-161 (fix in 1.0.1+) Both are surfaced via fastapi/httpx. Bumps via: uv lock --upgrade-package idna --upgrade-package starlette Resolves to idna 3.16 (3.15 was the listed fix; 3.16 is a further patch with the same fix) and starlette 1.1.0 (minor bump; FastAPI is compatible with it). All 192 unit tests pass on the upgraded lock. Bumps the project self-version 0.2.10 -> 0.2.11 per docs/DEVELOPMENT.md. Unblocks the pip-audit CI gate on #99, #100, #101, #102 (and any other PRs currently sitting on develop), all of which inherit the flagged transitive CVEs from develop and cannot pass that gate until this lands.

* feat: eval pattern examples calling Azure OpenAI (#94) The eval slice previously shipped one toy case (echo-hello) and a disabled-by-default nightly. A reader expecting an LLM-eval story found the infrastructure without conviction. Adds four worked-pattern cases that exercise the existing three tolerance modes against a real Azure OpenAI deployment. These are not benchmarks — they demonstrate what an eval case *looks like* for the four LLM-eval patterns you most often need to write: - factual-http-200 exact_match format-constrained recall - numeric-seconds-per-day numeric_close numeric reasoning + tolerance - definitional-fastapi-depends semantic_similar free-form judge-scored prose - structured-json-status exact_match structured-output adherence When the template is forked for a real project, replace these four with cases that exercise the project's own prompts; the patterns transfer regardless of what product is bolted on. Provider choice — Azure OpenAI via the openai SDK with AzureOpenAI client — is intentionally distinct from the rest of the harness (which uses Claude via Claude Code). Demonstrates that the LLMClient Protocol in src/eval/judge.py does its job: the eval core never imports openai, vendor lock-in lives only in the adapter. Changes: - src/eval/adapters/azure_openai.py — implements LLMClient via the openai.AzureOpenAI SDK. Reads endpoint/key/deployment/api-version from env. Lazy-imports the SDK so the module is importable without the optional extra installed; the adapter raises a clear AzureOpenAIConfigError if the env or SDK is missing. - eval/golden_patterns.json — the four cases with notes explaining which pattern each demonstrates. - eval/test_golden_patterns.py — separate test file gated on the Azure env vars via pytestmark. Skipped on a stock checkout, so `uv run pytest eval/` always exits 0. The toy test_golden_qa.py keeps running as before. - pyproject.toml — new optional [project.optional-dependencies] eval extra (just `openai>=1.40.0`), mypy override for openai.* matching the existing opentelemetry.* pattern, and a 0.2.10 -> 0.2.11 self-version bump. - .github/workflows/eval-nightly.yml — env vars renamed from the placeholder LLM_* set to AZURE_OPENAI_*. Header comment updated with the Azure setup recipe. uv sync now passes --extra eval. - docs/EVAL_HARNESS.md — new "Worked patterns" section with the table mapping case -> tolerance -> pattern, the local setup recipe, and a "Swapping providers" note documenting the Protocol-based extension path. Local gates: mypy --strict clean on 42 source files (was 31), ruff clean, ruff format clean, import-linter both contracts kept, 192 unit tests pass, eval/ runs 1 passed + 4 skipped without LLM env. Closes #94 * test: add adapter unit tests + adapters README (#94 review fixes) Addresses two gate failures on #104 surfaced by code review: 1. "Tests required" gate — feat: prefix declared a behaviour change but tests/ had no test for the new adapter (the eval/-side test only runs with live Azure credentials). Adds tests/test_eval_azure_openai_adapter.py: 13 fully-offline cases covering _resolve_config (defaults, override, empty-string fallback, missing-env error listing), the constructor (env wiring, explicit API version, missing-env, missing-SDK), and the two SDK call paths (complete_json structured-output mode, complete user-message dispatch, null-content returns "" / "{}"). The SDK is mocked at sys.modules level so the test never hits the network and never requires the openai extra to be installed. 2. "src/ README audit" gate — every src/ package needs a README.md per CLAUDE.md. Adds src/eval/adapters/README.md documenting the layer's purpose, the current adapter, a 7-step "adding a new adapter" recipe, and why the layer lives at the top of the import order. Also applies the reviewer's non-blocking sentinel-string suggestion: the magic "azure-deployment" string passed as judge_model in eval/test_golden_patterns.py is now the named constant _AZURE_DEPLOYMENT_SENTINEL with a comment explaining why the runner threads it through but the Azure adapter discards it. Local gates: 205 unit tests pass (was 192, +13 new), mypy clean on 43 source files, ruff/format/import-linter all green. Refs #94 * docs: add Key interfaces section to adapters README (#94 review) src/ README audit gate looks for a `## Key interfaces` (or `## Public surface`) anchor — the existing README had purpose / table / extension recipe / layering rationale, but no exported-names section. Adds a `## Key interfaces` section listing the two exported names: - AzureOpenAIClient — the LLMClient implementation with notes on complete() vs complete_json() and the discarded `model` arg (Azure dispatches by deployment, not model). - AzureOpenAIConfigError — the construction-time error type, noting that it batches every missing env var into a single message instead of failing-and-retrying. Both already documented in the adapter docstrings; this section hoists them to the README anchor the audit gate enforces. Refs #94 * chore: bump version to 0.2.12 (rebase onto develop after #103)

* chore: add optional Beads issue queue guidance * chore: address PR-86 review feedback (BEADS doc + template + CI-script compile gate) Applies the actionable items from the PR-86 review: - docs/BEADS.md: lead with a one-sentence "what Beads is" + upstream link; state the stance explicitly (optional/additive, recommended for agent-driven flows, GitHub remains authoritative); add a YAML example block under Recommended Bead fields; replace the duplicated Closure checklist with a Bead-specific narrowing that cites the PR template + CONTRIBUTING; call out that .beads/ is wiped by git clean -fdx. - .github/pull_request_template.md: collapse the "Local Beads" section into an HTML-commented opt-in block so it is invisible in the rendered preview until a Beads-using team uncomments it. - CONTRIBUTING.md: document the one-shot git renormalisation step for Windows clones after the .gitattributes change lands. - tests/test_scripts_compile.py: regression gate that py_compiles every .github/scripts/*.py. The "scripts unparseable" review finding was based on an older local Python — PEP 758 (3.14) makes the unparenthesised except clauses valid, so the scripts ARE fine on the project pin. The test guards against an actual syntax error landing in future. * chore: bump version to 0.2.11 --------- Co-authored-by: jakelindsay87 <jacob.b.lindsay@gmail.com>

* docs: mark admin-merge policy as transitional solo-owner state (#93) The existing "Solo-owner merge policy" section accurately documented how merges work today, but read as standing policy. From an external contributor's perspective it could look like the maintainer routinely bypasses their own gates. Adds a leading "Transitional" blockquote framing this as a single-owner workaround, not standing policy, and replaces the closing sentence with a numbered exit checklist (drop --admin, remove the subsection, update CODEOWNERS, optionally flip enforce_admins to true). All four changes land together when a second collaborator is onboarded. Mechanics of the merge command itself are unchanged. Closes #93 * chore: bump version to 0.2.11 * docs: make enforce_admins flip required in exit checklist (#93 review) Code review on #101 pushed back on step 4 of the "When the exemption ends" checklist: "Optionally flip enforce_admins to true". Leaving it false in a 2-person setup keeps the admin-bypass door open even after the single-owner workaround is no longer needed — which defeats the point of having an exit checklist. Drops "Optionally" and adds a one-line rationale so a future reader understands why the flip is non-optional. Refs #93

* docs: reframe README opener around the human+agent audience (#90) The previous opener led with what the harness is (a coding harness for Python+React) and folded the audience into a trailing clause. The new opener leads with who it's for — teams pairing AI agents with human engineers — and keeps the mechanism punchline ("every gate enforced mechanically in CI, not by discipline") that makes the harness story distinctive. Wording matches the repo's GitHub description for consistency between the two surfaces. Closes #90 * docs: tighten README opener — harness vocab + 0.2.11 bump (#90) Review feedback on #99: - "Production-grade SDLC scaffold" -> "Production-grade SDLC harness". Everywhere else (package name, docs/HARNESS.md, CLAUDE.md) calls it a harness; "scaffold" was an unintentional vocabulary drift. - "regardless of who's at the keyboard" -> "regardless of who shipped the code". Agents don't have keyboards; the original metaphor leaked. The new phrasing covers humans and agents without forcing the human-only mental model. - README opener now also mirrors the GitHub repo description verbatim ("human-LLM coding collaborations"), so the two surfaces stay aligned. Also bumps the project version 0.2.10 -> 0.2.11 (docs change -> PATCH per docs/DEVELOPMENT.md) in pyproject.toml and the self-version line in uv.lock, unblocking the "Version bump check" CI gate that flagged the original commit. The "enforced mechanically in CI, not by discipline" punchline is preserved verbatim. Refs #90

* docs: add concrete agent-failure example to "Why a harness" (#91) The "harness IS the product" claim reads abstract without a worked example. Adds a blockquoted, 3-line sidebar inside the "Why a harness" section showing one realistic failure mode: an agent reaches for a reverse import (src.models → src.tools), import-linter blocks it in CI against the "src.models depends on nothing in src/" contract, the agent's next iteration routes around it via docs/BOUNDARIES.md. Names a real gate, cites the real contract, links the real doc — so the example is verifiable, not theatre. Closes #91 * chore: bump version to 0.2.11

* docs: replace Jaeger screenshot TODO with section scaffold (#92) The observability story in README has one visible loose end: a TODO block where the Jaeger trace screenshot should go. The rest of the section reads cleanly, so the TODO sticks out. Promotes the placeholder to a real subsection ("Jaeger trace") with the explanatory caption already written: what boots the stack, what endpoint produces the trace, where to view it, and that span attributes use only the constant-defined semconv keys from src/observability/spans.py. The image itself still needs to be captured. The original capture recipe is preserved as an HTML comment so it remains discoverable, and the comment includes the exact one-line markdown to paste in once docs/images/jaeger-trace.png lands. Hiding the placeholder inside an HTML comment (rather than a broken-image ref) keeps the rendered README clean while the PNG is outstanding. The image-capture step itself is a follow-up — needs the maintainer to run docker compose locally and take the screenshot. Closes #92 (capture step tracked separately as a single-line README edit when the PNG is committed). * chore: bump version to 0.2.11

Merges develop's 8 commits ahead of main: - #83 fix: pin-freshness audit normalises sub-path actions - #103 fix: idna 3.16 + starlette 1.1.0 CVE patches - #104 feat: eval pattern examples calling Azure OpenAI - #106 chore: align develop with main (backport #86 content) - #101 docs: mark admin-merge policy as transitional - #99 docs: reframe README opener around the human+agent audience - #100 docs: add concrete agent-failure example to README - #105 docs: replace Jaeger screenshot TODO with section scaffold Version: 0.2.11 (main) -> 0.2.17 (develop tip). Conflicts on the version line in pyproject.toml + uv.lock resolved in favour of develop's 0.2.17. Why a dedicated release branch rather than develop -> main directly: main carries #86's squash commit (merged on 2026-05-25, bypassing develop). Develop later backported #86's content via #106 as a separate squash. The two paths give git no common ancestor on pyproject.toml / uv.lock, so a direct develop -> main PR conflicts. This branch resolves the merge once on top of main and is what main actually fast-forwards onto. # Conflicts: # pyproject.toml # uv.lock

constk and others added 9 commits May 3, 2026 13:56

fix: pin-freshness audit normalises sub-path actions before API call (#…

ea6b8b1

…83)

constk merged commit e542435 into main May 26, 2026
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: bring main up to develop (0.2.17 — release-readiness docs + eval pattern examples + transitive CVE patches)#108

release: bring main up to develop (0.2.17 — release-readiness docs + eval pattern examples + transitive CVE patches)#108
constk merged 9 commits into
mainfrom
release/0.2.17

constk commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

constk commented May 26, 2026

What ships in this release

Why a release/0.2.17 branch (not direct develop → main)

Version

Highlights

Test plan

Invariants affected

New deps / actions / external surface

Tagging note

Linked issues

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Why a `release/0.2.17` branch (not direct `develop → main`)