Skip to content

feat: BDD E2E contract tests for service behavioral spec#11

Draft
onmete wants to merge 1 commit into
openshift:mainfrom
onmete:e2e-bdd-tests
Draft

feat: BDD E2E contract tests for service behavioral spec#11
onmete wants to merge 1 commit into
openshift:mainfrom
onmete:e2e-bdd-tests

Conversation

@onmete
Copy link
Copy Markdown
Contributor

@onmete onmete commented Apr 28, 2026

Summary

  • Adds a pytest-bdd test suite under tests/e2e/ that verifies the service's behavioral contract across all 6 providers
  • Gherkin .feature files map 1:1 to the behavioral specs in .ai/spec/ — the scenario is both the requirement and the proof
  • Extracts container lifecycle into a shared scripts/start-containers.sh used by both make eval and make e2e

What's in the suite

Feature Verifies spec Scenarios
schema_compliance.feature query-api.md /analyze, /execute, /verify schema conformance + response wrapping
chat_sse.feature chat-api.md SSE event ordering, conversation continuity, tool call fields
skill_invocation.feature provider-contract.md Skill discovery, tool execution, token verification

8 BDD tests total, parametrized across all 6 providers when containers are running.

Design decisions for team discussion

  1. BDD format — follows the ASDLC verification layer convention (Gherkin Given/When/Then). Scenarios are human-readable and usable as acceptance criteria directly.

  2. Separate from evalstests/e2e/ is standalone with no imports from evals/. The two suites test different things:

    • evals/ = model quality (non-deterministic, expensive)
    • tests/e2e/ = service contract (structural, cheap)
  3. Cheap model defaultstests/e2e/config.env sets haiku/flash/4.1-mini as defaults. Contract tests validate structure, not quality — no need for expensive models.

  4. Shared container lifecyclescripts/start-containers.sh replaces the container management from evals/run.sh. Both make eval and make e2e use it.

How to run

make e2e                              # all providers
make e2e E2E_ARGS="-k claude"         # single provider
ANTHROPIC_MODEL=claude-opus-4-6 make e2e  # override model

Test plan

  • Team agrees on BDD approach for E2E verification
  • Run make e2e against at least one provider to validate
  • Verify make eval still works with the refactored container script
  • Review feature file scenarios for completeness

Made with Cursor

Introduce a pytest-bdd test suite under tests/e2e/ that verifies the
service's behavioral contract across all providers. Features are written
in Gherkin and map 1:1 to the behavioral specs in .ai/spec/:

- schema_compliance.feature → query-api.md (analyze, execute, verify)
- chat_sse.feature → chat-api.md (SSE event ordering, conversation continuity)
- skill_invocation.feature → provider-contract.md (skill discovery, tool execution)

Infrastructure changes:
- Extract container lifecycle into scripts/start-containers.sh (shared by make eval and make e2e)
- Add tests/e2e/config.env for cheap model defaults (haiku, flash, 4.1-mini)
- Add make e2e target, exclude e2e from make test
- Add pytest-bdd and httpx-sse to pyproject.toml e2e extra
- Add Verification sections to .ai/spec/ files cross-referencing features
- Update CLAUDE.md testing conventions

Made-with: Cursor
@openshift-ci openshift-ci Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 28, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Apr 30, 2026

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant