fix: bump idna + starlette to patched versions#103
Merged
Conversation
pip-audit on develop is flagging two transitive-dep CVEs: - idna 3.13 CVE-2026-45409 (fix in 3.15+) - starlette 1.0.0 PYSEC-2026-161 (fix in 1.0.1+) Both are surfaced via fastapi/httpx. Bumps via: uv lock --upgrade-package idna --upgrade-package starlette Resolves to idna 3.16 (3.15 was the listed fix; 3.16 is a further patch with the same fix) and starlette 1.1.0 (minor bump; FastAPI is compatible with it). All 192 unit tests pass on the upgraded lock. Bumps the project self-version 0.2.10 -> 0.2.11 per docs/DEVELOPMENT.md. Unblocks the pip-audit CI gate on #99, #100, #101, #102 (and any other PRs currently sitting on develop), all of which inherit the flagged transitive CVEs from develop and cannot pass that gate until this lands.
5 tasks
constk
added a commit
that referenced
this pull request
May 26, 2026
constk
added a commit
that referenced
this pull request
May 26, 2026
* feat: eval pattern examples calling Azure OpenAI (#94) The eval slice previously shipped one toy case (echo-hello) and a disabled-by-default nightly. A reader expecting an LLM-eval story found the infrastructure without conviction. Adds four worked-pattern cases that exercise the existing three tolerance modes against a real Azure OpenAI deployment. These are not benchmarks — they demonstrate what an eval case *looks like* for the four LLM-eval patterns you most often need to write: - factual-http-200 exact_match format-constrained recall - numeric-seconds-per-day numeric_close numeric reasoning + tolerance - definitional-fastapi-depends semantic_similar free-form judge-scored prose - structured-json-status exact_match structured-output adherence When the template is forked for a real project, replace these four with cases that exercise the project's own prompts; the patterns transfer regardless of what product is bolted on. Provider choice — Azure OpenAI via the openai SDK with AzureOpenAI client — is intentionally distinct from the rest of the harness (which uses Claude via Claude Code). Demonstrates that the LLMClient Protocol in src/eval/judge.py does its job: the eval core never imports openai, vendor lock-in lives only in the adapter. Changes: - src/eval/adapters/azure_openai.py — implements LLMClient via the openai.AzureOpenAI SDK. Reads endpoint/key/deployment/api-version from env. Lazy-imports the SDK so the module is importable without the optional extra installed; the adapter raises a clear AzureOpenAIConfigError if the env or SDK is missing. - eval/golden_patterns.json — the four cases with notes explaining which pattern each demonstrates. - eval/test_golden_patterns.py — separate test file gated on the Azure env vars via pytestmark. Skipped on a stock checkout, so `uv run pytest eval/` always exits 0. The toy test_golden_qa.py keeps running as before. - pyproject.toml — new optional [project.optional-dependencies] eval extra (just `openai>=1.40.0`), mypy override for openai.* matching the existing opentelemetry.* pattern, and a 0.2.10 -> 0.2.11 self-version bump. - .github/workflows/eval-nightly.yml — env vars renamed from the placeholder LLM_* set to AZURE_OPENAI_*. Header comment updated with the Azure setup recipe. uv sync now passes --extra eval. - docs/EVAL_HARNESS.md — new "Worked patterns" section with the table mapping case -> tolerance -> pattern, the local setup recipe, and a "Swapping providers" note documenting the Protocol-based extension path. Local gates: mypy --strict clean on 42 source files (was 31), ruff clean, ruff format clean, import-linter both contracts kept, 192 unit tests pass, eval/ runs 1 passed + 4 skipped without LLM env. Closes #94 * test: add adapter unit tests + adapters README (#94 review fixes) Addresses two gate failures on #104 surfaced by code review: 1. "Tests required" gate — feat: prefix declared a behaviour change but tests/ had no test for the new adapter (the eval/-side test only runs with live Azure credentials). Adds tests/test_eval_azure_openai_adapter.py: 13 fully-offline cases covering _resolve_config (defaults, override, empty-string fallback, missing-env error listing), the constructor (env wiring, explicit API version, missing-env, missing-SDK), and the two SDK call paths (complete_json structured-output mode, complete user-message dispatch, null-content returns "" / "{}"). The SDK is mocked at sys.modules level so the test never hits the network and never requires the openai extra to be installed. 2. "src/ README audit" gate — every src/ package needs a README.md per CLAUDE.md. Adds src/eval/adapters/README.md documenting the layer's purpose, the current adapter, a 7-step "adding a new adapter" recipe, and why the layer lives at the top of the import order. Also applies the reviewer's non-blocking sentinel-string suggestion: the magic "azure-deployment" string passed as judge_model in eval/test_golden_patterns.py is now the named constant _AZURE_DEPLOYMENT_SENTINEL with a comment explaining why the runner threads it through but the Azure adapter discards it. Local gates: 205 unit tests pass (was 192, +13 new), mypy clean on 43 source files, ruff/format/import-linter all green. Refs #94 * docs: add Key interfaces section to adapters README (#94 review) src/ README audit gate looks for a `## Key interfaces` (or `## Public surface`) anchor — the existing README had purpose / table / extension recipe / layering rationale, but no exported-names section. Adds a `## Key interfaces` section listing the two exported names: - AzureOpenAIClient — the LLMClient implementation with notes on complete() vs complete_json() and the discarded `model` arg (Azure dispatches by deployment, not model). - AzureOpenAIConfigError — the construction-time error type, noting that it batches every missing env var into a single message instead of failing-and-retrying. Both already documented in the adapter docstrings; this section hoists them to the README anchor the audit gate enforces. Refs #94 * chore: bump version to 0.2.12 (rebase onto develop after #103)
constk
added a commit
that referenced
this pull request
May 26, 2026
…sed post-#103/#104) main moved ahead of develop on 2026-05-25 when PR #86 was merged directly to main rather than via develop -> release flow. The divergence is one squash commit (eff5b1c) carrying: - docs/BEADS.md (optional Beads issue-queue guidance) - .github/pull_request_template.md (Beads PR-template block) - .github/scripts/check_aspirational_tickets.py (PEP 758 reformat) - .github/scripts/check_pin_freshness.py / check_tests_present.py / check_version_bump.py (touch-ups) - .gitattributes / .gitignore (.beads/ ignore, Windows renormalise) - CONTRIBUTING.md (line-ending normalisation) - tests/test_scripts_compile.py (new CI-script compile gate) - docs/DEVELOPMENT.md / docs/HARNESS.md / docs/HARNESS_PRIMER.md cross-refs - pyproject.toml + uv.lock self-version 0.2.10 -> 0.2.11 This PR was rebased after #103 (CVE fix, develop -> 0.2.11) and #104 (eval pattern examples, develop -> 0.2.12) merged. The version on main (0.2.11) is now behind develop (0.2.12); the conflict is resolved by bumping develop -> 0.2.13. After this lands, develop is at 0.2.13 and contains everything main has. Remaining in-flight PRs (#99, #100, #101, #105) need to rebase to bump 0.2.13 -> 0.2.14 (and onward sequentially as they merge). No behaviour change beyond what #86 already added to main. # Conflicts: # pyproject.toml # uv.lock
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
pip-audit on
developflags two transitive-dep CVEs (surfaced viafastapi/httpx):idnastarletteBumped via:
Resolves to
idna 3.16(3.15 was the listed fix; 3.16 is a further patch on the same line) andstarlette 1.1.0(minor bump). The currently pinnedfastapi 0.136.1accepts Starlette 1.1.x — confirmed locally by running the unit suite. All 192 unit tests pass on the upgraded lock;pip-auditthen returns "No known vulnerabilities found".Also bumps the project self-version
0.2.10 → 0.2.11per docs/DEVELOPMENT.md.Why this is its own PR. Code review on #99 (the README opener PR) recommended landing the CVE bumps separately so the dep-bump risk doesn't pile onto a docs-only change. PRs #99, #100, #101, #102/#105 all currently inherit these CVEs from
developand cannot passpip-audituntil this PR lands — so this is the unblocking commit for the whole release-blockers stack.Test plan
uv lock --upgrade-package idna --upgrade-package starletteregenerates the lock cleanlyuv sync --frozen --extra devsucceeds with the new lockuv run pytest tests/ -q→ 192 passed (fastapi 0.136.1 + starlette 1.1.0 compatible)uv run pip-audit→ "No known vulnerabilities found"Invariants affected
None.
New deps / actions / external surface
No new direct deps; two transitive deps bumped.
starlettejumps minor (1.0.0 → 1.1.0) — verified compatible with the pinnedfastapi 0.136.1.Linked issue
None — surfaced by the #99 code review.