Skip to content

release: bring main up to develop (0.2.17 — release-readiness docs + eval pattern examples + transitive CVE patches)#108

Merged
constk merged 9 commits into
mainfrom
release/0.2.17
May 26, 2026
Merged

release: bring main up to develop (0.2.17 — release-readiness docs + eval pattern examples + transitive CVE patches)#108
constk merged 9 commits into
mainfrom
release/0.2.17

Conversation

@constk
Copy link
Copy Markdown
Owner

@constk constk commented May 26, 2026

What ships in this release

PR Commit on develop Theme
#83 ea6b8b1 pin-freshness audit normalises sub-path actions before API call (carried over from prior session)
#103 d256e32 Security: transitive-dep CVE patches — idna 3.13 → 3.16 (CVE-2026-45409), starlette 1.0.0 → 1.1.0 (PYSEC-2026-161)
#104 18b4d30 Feature: eval pattern examples calling Azure OpenAI — 4 worked cases across the existing tolerance modes, new src/eval/adapters/azure_openai.py adapter, optional [eval] extra
#106 eb0136e Chore: align develop with main (backport #86's Beads guidance + scaffold updates that landed directly on main on 2026-05-25)
#101 722293d Docs: mark admin-merge policy as transitional solo-owner state
#99 59ad7f0 Docs: reframe README opener around the human+agent audience
#100 7c84f18 Docs: add concrete agent-failure example to "Why a harness"
#105 8938eb7 Docs: replace Jaeger screenshot TODO with section scaffold

Why a release/0.2.17 branch (not direct develop → main)

#86 was merged directly to main on 2026-05-25, bypassing the standard develop → release flow. Develop later backported #86's content via #106 as a separate squash. The two paths give git no common ancestor on pyproject.toml / uv.lock, so the direct develop → main PR (#107, now closed) conflicted on the version line.

This PR is main + one merge commit pulling in develop with the conflict resolved (take develop's 0.2.17). All other files auto-merged cleanly. After this lands, main is a fast-forward of develop and the divergence is closed.

Version

0.2.11 → 0.2.17. Six PATCH bumps cascaded on develop as each in-flight PR rebased over the previous one — one bump per merge, as required by the version-bump gate.

Highlights

Test plan

  • Local merge of origin/develop into release/0.2.17 resolved with no other conflicts
  • uv run --frozen pytest tests/ -q215 passed on the merged tree
  • uv run --frozen mypy --strict src/ tests/ → clean on 44 source files
  • uv run --frozen ruff check . → All checks passed
  • uv run --frozen lint-imports → both contracts kept
  • CI on this PR (verify after open)

Invariants affected

None new.

New deps / actions / external surface

  • New optional Python extra: [eval] with openai>=1.40.0
  • New external endpoint: Azure OpenAI (per-deployment URL). Only called from eval/test_golden_patterns.py, only when AZURE_OPENAI_* env vars are set.
  • No new GitHub Actions; no new runtime deps in the default install.

Tagging note

Per .github/workflows/release.yml, the public release (GHCR image push, CycloneDX SBOM, GitHub Release page) is tag-triggered. Tag v0.2.17 against the merge commit after this PR lands to publish. Per CONTRIBUTING.md, the merge command for release: PRs is gh pr merge <N> --admin --merge --delete-branch (preserves history, no squash).

Linked issues

Closes none directly (all linked issues already closed on develop). This PR fans the closures out to main.

constk and others added 9 commits May 3, 2026 13:56
pip-audit on develop is flagging two transitive-dep CVEs:

- idna 3.13            CVE-2026-45409   (fix in 3.15+)
- starlette 1.0.0      PYSEC-2026-161   (fix in 1.0.1+)

Both are surfaced via fastapi/httpx. Bumps via:

    uv lock --upgrade-package idna --upgrade-package starlette

Resolves to idna 3.16 (3.15 was the listed fix; 3.16 is a further
patch with the same fix) and starlette 1.1.0 (minor bump; FastAPI is
compatible with it). All 192 unit tests pass on the upgraded lock.

Bumps the project self-version 0.2.10 -> 0.2.11 per
docs/DEVELOPMENT.md.

Unblocks the pip-audit CI gate on #99, #100, #101, #102 (and any
other PRs currently sitting on develop), all of which inherit the
flagged transitive CVEs from develop and cannot pass that gate until
this lands.
* feat: eval pattern examples calling Azure OpenAI (#94)

The eval slice previously shipped one toy case (echo-hello) and a
disabled-by-default nightly. A reader expecting an LLM-eval story
found the infrastructure without conviction.

Adds four worked-pattern cases that exercise the existing three
tolerance modes against a real Azure OpenAI deployment. These are
not benchmarks — they demonstrate what an eval case *looks like* for
the four LLM-eval patterns you most often need to write:

  - factual-http-200             exact_match       format-constrained recall
  - numeric-seconds-per-day      numeric_close     numeric reasoning + tolerance
  - definitional-fastapi-depends semantic_similar  free-form judge-scored prose
  - structured-json-status       exact_match       structured-output adherence

When the template is forked for a real project, replace these four
with cases that exercise the project's own prompts; the patterns
transfer regardless of what product is bolted on.

Provider choice — Azure OpenAI via the openai SDK with AzureOpenAI
client — is intentionally distinct from the rest of the harness
(which uses Claude via Claude Code). Demonstrates that the LLMClient
Protocol in src/eval/judge.py does its job: the eval core never
imports openai, vendor lock-in lives only in the adapter.

Changes:

  - src/eval/adapters/azure_openai.py — implements LLMClient via the
    openai.AzureOpenAI SDK. Reads endpoint/key/deployment/api-version
    from env. Lazy-imports the SDK so the module is importable without
    the optional extra installed; the adapter raises a clear
    AzureOpenAIConfigError if the env or SDK is missing.

  - eval/golden_patterns.json — the four cases with notes explaining
    which pattern each demonstrates.

  - eval/test_golden_patterns.py — separate test file gated on the
    Azure env vars via pytestmark. Skipped on a stock checkout, so
    `uv run pytest eval/` always exits 0. The toy test_golden_qa.py
    keeps running as before.

  - pyproject.toml — new optional [project.optional-dependencies] eval
    extra (just `openai>=1.40.0`), mypy override for openai.* matching
    the existing opentelemetry.* pattern, and a 0.2.10 -> 0.2.11
    self-version bump.

  - .github/workflows/eval-nightly.yml — env vars renamed from the
    placeholder LLM_* set to AZURE_OPENAI_*. Header comment updated
    with the Azure setup recipe. uv sync now passes --extra eval.

  - docs/EVAL_HARNESS.md — new "Worked patterns" section with the
    table mapping case -> tolerance -> pattern, the local setup
    recipe, and a "Swapping providers" note documenting the
    Protocol-based extension path.

Local gates: mypy --strict clean on 42 source files (was 31), ruff
clean, ruff format clean, import-linter both contracts kept, 192
unit tests pass, eval/ runs 1 passed + 4 skipped without LLM env.

Closes #94

* test: add adapter unit tests + adapters README (#94 review fixes)

Addresses two gate failures on #104 surfaced by code review:

1. "Tests required" gate — feat: prefix declared a behaviour change
   but tests/ had no test for the new adapter (the eval/-side test
   only runs with live Azure credentials). Adds
   tests/test_eval_azure_openai_adapter.py: 13 fully-offline cases
   covering _resolve_config (defaults, override, empty-string
   fallback, missing-env error listing), the constructor (env
   wiring, explicit API version, missing-env, missing-SDK), and the
   two SDK call paths (complete_json structured-output mode,
   complete user-message dispatch, null-content returns "" / "{}").

   The SDK is mocked at sys.modules level so the test never hits the
   network and never requires the openai extra to be installed.

2. "src/ README audit" gate — every src/ package needs a README.md
   per CLAUDE.md. Adds src/eval/adapters/README.md documenting the
   layer's purpose, the current adapter, a 7-step "adding a new
   adapter" recipe, and why the layer lives at the top of the import
   order.

Also applies the reviewer's non-blocking sentinel-string suggestion:
the magic "azure-deployment" string passed as judge_model in
eval/test_golden_patterns.py is now the named constant
_AZURE_DEPLOYMENT_SENTINEL with a comment explaining why the runner
threads it through but the Azure adapter discards it.

Local gates: 205 unit tests pass (was 192, +13 new), mypy clean on
43 source files, ruff/format/import-linter all green.

Refs #94

* docs: add Key interfaces section to adapters README (#94 review)

src/ README audit gate looks for a `## Key interfaces` (or `## Public
surface`) anchor — the existing README had purpose / table /
extension recipe / layering rationale, but no exported-names section.

Adds a `## Key interfaces` section listing the two exported names:

  - AzureOpenAIClient — the LLMClient implementation with notes on
    complete() vs complete_json() and the discarded `model` arg
    (Azure dispatches by deployment, not model).
  - AzureOpenAIConfigError — the construction-time error type,
    noting that it batches every missing env var into a single
    message instead of failing-and-retrying.

Both already documented in the adapter docstrings; this section
hoists them to the README anchor the audit gate enforces.

Refs #94

* chore: bump version to 0.2.12 (rebase onto develop after #103)
* chore: add optional Beads issue queue guidance

* chore: address PR-86 review feedback (BEADS doc + template + CI-script compile gate)

Applies the actionable items from the PR-86 review:

- docs/BEADS.md: lead with a one-sentence "what Beads is" + upstream link;
  state the stance explicitly (optional/additive, recommended for agent-driven
  flows, GitHub remains authoritative); add a YAML example block under
  Recommended Bead fields; replace the duplicated Closure checklist with a
  Bead-specific narrowing that cites the PR template + CONTRIBUTING; call out
  that .beads/ is wiped by git clean -fdx.
- .github/pull_request_template.md: collapse the "Local Beads" section into
  an HTML-commented opt-in block so it is invisible in the rendered preview
  until a Beads-using team uncomments it.
- CONTRIBUTING.md: document the one-shot git renormalisation step for
  Windows clones after the .gitattributes change lands.
- tests/test_scripts_compile.py: regression gate that py_compiles every
  .github/scripts/*.py. The "scripts unparseable" review finding was based on
  an older local Python — PEP 758 (3.14) makes the unparenthesised except
  clauses valid, so the scripts ARE fine on the project pin. The test
  guards against an actual syntax error landing in future.

* chore: bump version to 0.2.11

---------

Co-authored-by: jakelindsay87 <jacob.b.lindsay@gmail.com>
* docs: mark admin-merge policy as transitional solo-owner state (#93)

The existing "Solo-owner merge policy" section accurately documented
how merges work today, but read as standing policy. From an external
contributor's perspective it could look like the maintainer routinely
bypasses their own gates.

Adds a leading "Transitional" blockquote framing this as a single-owner
workaround, not standing policy, and replaces the closing sentence with
a numbered exit checklist (drop --admin, remove the subsection, update
CODEOWNERS, optionally flip enforce_admins to true). All four changes
land together when a second collaborator is onboarded.

Mechanics of the merge command itself are unchanged.

Closes #93

* chore: bump version to 0.2.11

* docs: make enforce_admins flip required in exit checklist (#93 review)

Code review on #101 pushed back on step 4 of the "When the exemption
ends" checklist: "Optionally flip enforce_admins to true". Leaving it
false in a 2-person setup keeps the admin-bypass door open even after
the single-owner workaround is no longer needed — which defeats the
point of having an exit checklist.

Drops "Optionally" and adds a one-line rationale so a future reader
understands why the flip is non-optional.

Refs #93
* docs: reframe README opener around the human+agent audience (#90)

The previous opener led with what the harness is (a coding harness for
Python+React) and folded the audience into a trailing clause. The new
opener leads with who it's for — teams pairing AI agents with human
engineers — and keeps the mechanism punchline ("every gate enforced
mechanically in CI, not by discipline") that makes the harness story
distinctive.

Wording matches the repo's GitHub description for consistency between
the two surfaces.

Closes #90

* docs: tighten README opener — harness vocab + 0.2.11 bump (#90)

Review feedback on #99:

- "Production-grade SDLC scaffold" -> "Production-grade SDLC harness".
  Everywhere else (package name, docs/HARNESS.md, CLAUDE.md) calls it
  a harness; "scaffold" was an unintentional vocabulary drift.
- "regardless of who's at the keyboard" -> "regardless of who shipped
  the code". Agents don't have keyboards; the original metaphor leaked.
  The new phrasing covers humans and agents without forcing the
  human-only mental model.
- README opener now also mirrors the GitHub repo description verbatim
  ("human-LLM coding collaborations"), so the two surfaces stay
  aligned.

Also bumps the project version 0.2.10 -> 0.2.11 (docs change -> PATCH
per docs/DEVELOPMENT.md) in pyproject.toml and the self-version line
in uv.lock, unblocking the "Version bump check" CI gate that flagged
the original commit.

The "enforced mechanically in CI, not by discipline" punchline is
preserved verbatim.

Refs #90
* docs: add concrete agent-failure example to "Why a harness" (#91)

The "harness IS the product" claim reads abstract without a worked
example. Adds a blockquoted, 3-line sidebar inside the "Why a harness"
section showing one realistic failure mode: an agent reaches for a
reverse import (src.models → src.tools), import-linter blocks it in CI
against the "src.models depends on nothing in src/" contract, the
agent's next iteration routes around it via docs/BOUNDARIES.md.

Names a real gate, cites the real contract, links the real doc — so
the example is verifiable, not theatre.

Closes #91

* chore: bump version to 0.2.11
* docs: replace Jaeger screenshot TODO with section scaffold (#92)

The observability story in README has one visible loose end: a TODO
block where the Jaeger trace screenshot should go. The rest of the
section reads cleanly, so the TODO sticks out.

Promotes the placeholder to a real subsection ("Jaeger trace") with
the explanatory caption already written: what boots the stack, what
endpoint produces the trace, where to view it, and that span
attributes use only the constant-defined semconv keys from
src/observability/spans.py.

The image itself still needs to be captured. The original capture
recipe is preserved as an HTML comment so it remains discoverable,
and the comment includes the exact one-line markdown to paste in once
docs/images/jaeger-trace.png lands. Hiding the placeholder inside an
HTML comment (rather than a broken-image ref) keeps the rendered
README clean while the PNG is outstanding.

The image-capture step itself is a follow-up — needs the maintainer
to run docker compose locally and take the screenshot.

Closes #92 (capture step tracked separately as a single-line README
edit when the PNG is committed).

* chore: bump version to 0.2.11
Merges develop's 8 commits ahead of main:

- #83  fix: pin-freshness audit normalises sub-path actions
- #103 fix: idna 3.16 + starlette 1.1.0 CVE patches
- #104 feat: eval pattern examples calling Azure OpenAI
- #106 chore: align develop with main (backport #86 content)
- #101 docs: mark admin-merge policy as transitional
- #99  docs: reframe README opener around the human+agent audience
- #100 docs: add concrete agent-failure example to README
- #105 docs: replace Jaeger screenshot TODO with section scaffold

Version: 0.2.11 (main) -> 0.2.17 (develop tip). Conflicts on the
version line in pyproject.toml + uv.lock resolved in favour of
develop's 0.2.17.

Why a dedicated release branch rather than develop -> main directly:
main carries #86's squash commit (merged on 2026-05-25, bypassing
develop). Develop later backported #86's content via #106 as a
separate squash. The two paths give git no common ancestor on
pyproject.toml / uv.lock, so a direct develop -> main PR conflicts.
This branch resolves the merge once on top of main and is what main
actually fast-forwards onto.

# Conflicts:
#	pyproject.toml
#	uv.lock
@constk constk merged commit e542435 into main May 26, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant