feat: BDD E2E contract tests for service behavioral spec#11
Draft
onmete wants to merge 1 commit into
Draft
Conversation
Introduce a pytest-bdd test suite under tests/e2e/ that verifies the service's behavioral contract across all providers. Features are written in Gherkin and map 1:1 to the behavioral specs in .ai/spec/: - schema_compliance.feature → query-api.md (analyze, execute, verify) - chat_sse.feature → chat-api.md (SSE event ordering, conversation continuity) - skill_invocation.feature → provider-contract.md (skill discovery, tool execution) Infrastructure changes: - Extract container lifecycle into scripts/start-containers.sh (shared by make eval and make e2e) - Add tests/e2e/config.env for cheap model defaults (haiku, flash, 4.1-mini) - Add make e2e target, exclude e2e from make test - Add pytest-bdd and httpx-sse to pyproject.toml e2e extra - Add Verification sections to .ai/spec/ files cross-referencing features - Update CLAUDE.md testing conventions Made-with: Cursor
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
pytest-bddtest suite undertests/e2e/that verifies the service's behavioral contract across all 6 providers.featurefiles map 1:1 to the behavioral specs in.ai/spec/— the scenario is both the requirement and the proofscripts/start-containers.shused by bothmake evalandmake e2eWhat's in the suite
schema_compliance.featurequery-api.mdchat_sse.featurechat-api.mdskill_invocation.featureprovider-contract.md8 BDD tests total, parametrized across all 6 providers when containers are running.
Design decisions for team discussion
BDD format — follows the ASDLC verification layer convention (Gherkin Given/When/Then). Scenarios are human-readable and usable as acceptance criteria directly.
Separate from evals —
tests/e2e/is standalone with no imports fromevals/. The two suites test different things:evals/= model quality (non-deterministic, expensive)tests/e2e/= service contract (structural, cheap)Cheap model defaults —
tests/e2e/config.envsets haiku/flash/4.1-mini as defaults. Contract tests validate structure, not quality — no need for expensive models.Shared container lifecycle —
scripts/start-containers.shreplaces the container management fromevals/run.sh. Bothmake evalandmake e2euse it.How to run
Test plan
make e2eagainst at least one provider to validatemake evalstill works with the refactored container scriptMade with Cursor