ci(proofs): gate the Lean 4 verification corpus (verification/proofs/lean4) by hyperpolymath · Pull Request #275 · hyperpolymath/echidna

hyperpolymath · 2026-06-21T14:22:39Z

What

Adds CI coverage for the Lean 4 trust-pipeline property proofs under
verification/proofs/lean4/ — they previously fell through every proof-CI filter.

New verif-lean job in dogfood-proofs-ci.yml, a structural clone of the existing
lean job (elan shim + pinned-toolchain pre-install with retry/backoff, then just),
reading the pin from verification/proofs/lean4/lean-toolchain and running a new
just proofs-verif-lean recipe.
New Justfile recipe proofs-verif-lean = cd verification/proofs/lean4 && lake build,
also added to the proofs roll-up so just proofs checks it locally too.
Path triggers extended to verification/proofs/lean4/** (both push and pull_request).

Why

dogfood-proofs-ci gated only proofs/{coq,lean,agda} + src/idris; the
idris2-abi-ci and agda-meta-checker workflows are path-filtered to src/abi/**
and meta-checker/**. So the five property proofs here
(ConfidenceLattice, ParetoMaximality, ParetoStrongMaximality,
IntegrityVerification, PortfolioCompleteness) had no gate — regressions would
have landed silently.

Verification (run locally, not eyeballed)

On Lean 4.13.0 (installed via the project's pinned tarball, the same version the
corpus pins):

Fresh lake build with .lake/build cleared first → all 5 modules compile,
Build completed successfully, exit 0 (~24 s).
just proofs-verif-lean → exit 0.
grep -nE 'sorry|admit|^\s*axiom |native_decide' across the corpus → none —
these are genuine, hole-free proofs.

No mathlib dependency (lake-manifest packages: []), so the job builds well under the
20-minute timeout.

Notes

No source/proof files changed — CI wiring only.
Next in the same sweep (separate PRs): gate verification/proofs/idris2 (Idris2,
idris2 --check per file), and a weekly cron for the heavier Isabelle/Mizar corpora.

🤖 Generated with Claude Code

Generated by Claude Code

Two pre-existing failures left red on main after #220 merged: - agda-meta-checker.yml: `cabal install Agda-2.7.0.1` aborts on a warm ~/.cabal cache with "Path '.../bin/agda' already exists"; add --overwrite-policy=always so the install is idempotent. Until this lands the Agda meta-checker never actually runs in CI. - contractile.just defined a `heal` recipe that collides with the root Justfile's `heal` across `import?`, so `just` aborts with a duplicate-recipe error — breaking every `just` call incl. MVP Smoke's `just mvp-env`. Drop the contractile copy; keep the richer root recipe (and contractile's heal-provers). https://claude.ai/code/session_01UAqDQaMwpUqWHUSZekGZWv

ProverKindInjectivity.idr: - Replace all 107 LTESucc towers with lteLit calls (eliminates off-by-one bug class permanently; Nunchaku was -1 close and Faial was +1) - Add `import NatLte` for the machine-checked LTE helper - Fix docstring range [0, 104] -> [0, 106] to match actual max discriminant DispatchCorrectness.idr: - Remove `import EchidnaABI.Provers` (pulls in a conflicting `proverLogic` from InteractiveAssistants; file only needs EchidnaABI.Types) - Allows file to type-check with the echidnaabi package installed proofs/agda/Basic.agda: - Hoist `data ℕ` and `data _≡_` to module scope (were only in `where` blocks, causing scope errors when used in type signatures) - Remove duplicate `data _≡_` from CombinatorLaws/CurryUncurryLaws (they now inherit from module scope) - Instantiate SKK's second const with {B = A} to resolve unsolved meta All 11 verification/proofs/idris2/ files now pass `idris2 --check`. 12/13 proofs/agda/ files pass (DispatchOrdering needs stdlib 2.1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Coq: - proofs/coq/nat.v: fix Nat.lt_irrefl/Nat.lt_trans usage for stdlib compat - proofs/coq/list.v: fix in_filter backward direction — case-split on (f h) first so congruence can close the x=h/f h=false/f x=true contradiction - proofs/coq/algebra/GroupTheory.v: complete rewrite; make G implicit in GroupAxioms Record so projections unify; remove Section (explicit params per theorem); fix rewrite direction in left_cancel (forward) vs right_cancel (backward); fix inv_mul associativity rewrite chain Lean4: - proofs/lean/Nat.lean, List.lean, Propositional.lean: fix imports and theorem statements for Lean 4.x stdlib API - proofs/lean/algebra/GroupTheory.lean: port group theory proofs to Lean4 - proofs/lean/ECHIDNA.lean: add root module required by lakefile roots Agda: - proofs/agda/DispatchOrdering.agda: replace z<s/s<s (not in installed stdlib) with s≤s z≤n; Fin._<_ = ℕ._<_ on toℕ = suc · ≤ · suc Idris2: - src/idris/Validator.idr: rename bound var `proof` → `pterm` throughout; `proof` is a reserved keyword in Idris2 0.8.0 (Idris1 tactic holdover) Build: - .gitignore: add proofs/agda/**/dist-newstyle/, proofs/lean/**/.lake/, .lia.cache, .nia.cache - proofs/lean/lake-manifest.json: add empty Lake lock file (no ext deps) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…idator) The proof corpus had zero CI coverage: agda-meta-checker.yml only triggers on meta-checker/**, idris2-abi-ci.yml only on src/abi/**, and there was no Coq or Lean workflow at all. So proofs/{coq,lean,agda} and src/idris were never type-checked in CI — a green PR proved nothing about the proofs and regressions would land silently. New dogfood-proofs-ci.yml with three jobs (Coq via apt, Lean 4 via elan, Agda 2.7.0.1 + stdlib v2.1), each calling a just recipe so local and CI run identical commands (RSR-H14). idris2-abi-ci.yml extended to also type-check the src/idris validator (reuses its Idris2 install; adds src/idris/** to the path trigger). Closing the gap surfaced two latent bugs the absent CI had hidden: - proofs/lean/lakefile.lean rooted only `ECHIDNA`, whose umbrella imported 4 of 6 files; `lake build` silently skipped algebra/GroupTheory.lean and mvp_basic.lean, and in fact failed outright ("unknown module prefix 'Basic'") because the top-level modules were not in the lib's module space. Now every proof file is a root and ECHIDNA.lean imports the whole corpus. - src/idris/echidna-validator.ipkg declares an executable (main = Validator) but Validator.idr had no `main`, so `idris2 --build` failed with "Undefined name Validator.main". Added `main = testValidation`. New Justfile recipes: proofs, proofs-coq, proofs-lean, proofs-agda, proofs-idris (proofs-agda exports LC_ALL=C.UTF-8 — Agda aborts under a non-UTF-8 locale). Verified locally against real toolchains: coqc 8.18.0 (5/5), Lean v4.13.0 lake build (6 modules + umbrella), agda (13/13), idris2 0.8.0 --typecheck. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Re-triggers the workflow's first real run (the initial PR run hit a startup_failure during workflow registration) and keeps the header comments ASCII for parser portability. No behavioural change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Temporary: reduce to one trivial job to localise the GitHub-parser rejection (passes yq + actionlint but startup_failures twice). Jobs restored in the next commit once the skeleton is confirmed to start. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…actions taiki-e/install-action reproducibly triggers a startup_failure in this workflow's context (bisected: header + checkout + run steps start fine; adding the action fails before any job runs, even though mvp-smoke.yml uses the same pin on main). Switch all toolchain installs to plain run steps -- apt for Coq and Agda, curl tarballs for elan and just -- matching the idris2/chapel workflows, which use only actions/checkout plus run steps. Agda now installs via apt (agda + agda-stdlib), which reproduces the local verification environment exactly. just is installed from its release tarball. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Agda job failed with "Library 'standard-library' not found / Installed libraries: (none)": apt-get --fix-missing silently skips agda-stdlib when it can't be fetched, exiting 0, so the registration step found no .agda-lib and wrote an empty ~/.agda/libraries. Install only the agda binary via apt (2.6.3 on ubuntu-24.04) and fetch the matching stdlib v1.7.3 from GitHub, registering its explicit path -- the same deterministic pattern agda-meta-checker.yml uses. Verified locally: all 13 corpus files type-check against a fresh GitHub v1.7.3 in an isolated AGDA_DIR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The trust-pipeline property proofs under verification/proofs/lean4 (ConfidenceLattice, ParetoMaximality, ParetoStrongMaximality, IntegrityVerification, PortfolioCompleteness) had no CI coverage: dogfood-proofs-ci gated only proofs/{coq,lean,agda} + src/idris, and the idris2-abi / agda-meta-checker workflows are path-filtered to src/abi/** and meta-checker/** respectively, so this corpus fell through every filter. Add a `verif-lean` job mirroring the existing `lean` job (elan shim + pinned-toolchain pre-install with retry/backoff, then `just`), reading the pin from verification/proofs/lean4/lean-toolchain and running a new `just proofs-verif-lean` recipe (cd verification/proofs/lean4 && lake build). No mathlib dependency (lake-manifest packages: []), so it builds well under the 20-minute timeout. Extend the push/PR path triggers to verification/proofs/lean4/**. Verified locally on Lean 4.13.0: a fresh `lake build` (cleared .lake/build first) compiles all five modules, exit 0, with zero sorry/admit/axiom/native_decide across the corpus. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01UAqDQaMwpUqWHUSZekGZWv

…idris2) (#276) ## What Adds CI coverage for the **Idris2 trust-pipeline property proofs** under `verification/proofs/idris2/` (11 modules) — the last self-proof corpus with no gate. Follows #275 (which gated `verification/proofs/lean4`). - **New `echidna-verif-proofs.ipkg`** listing all 11 modules with `depends = base, echidnaabi`. - **New Justfile recipe `proofs-verif-idris`**: installs the `echidnaabi` package (`--install` builds it on the way), then `idris2 --typecheck echidna-verif-proofs.ipkg`. Added to the `proofs` roll-up. - **New `verif-idris` job** in `dogfood-proofs-ci.yml` (apt idris2 + source-bootstrap fallback, mirroring `idris2-abi-ci.yml`) + `verification/proofs/idris2/**` path triggers. ## Why an ipkg, not a per-file `idris2 --check` loop Two real problems surfaced while building this (caught by *running* it, not eyeballing): 1. **`idris2 --check` returns exit 0 even when a module fails to resolve an import.** Verified: a bare `idris2 --check DispatchCorrectness.idr` prints `Error: Module EchidnaABI.Types not found` but **exits 0** — a per-file loop would pass CI while broken. (Genuine *type* errors do exit 1; only module-resolution errors don't.) 2. **`DispatchCorrectness.idr` imports `EchidnaABI.Types`** — the `echidnaabi` package in `src/abi/`, unresolvable standalone. Building through the ipkg fixes both: `depends = echidnaabi` resolves the import, and the package build gives a reliable aggregate exit code. The recipe additionally fails on any `^Error` line as a belt-and-suspenders guard. ## Verification (run, not eyeballed) On **idris2 0.8.0**, from a clean state (`src/abi/build`, the installed `echidnaabi` package, and `verification/proofs/idris2/build` all removed first): - `just proofs-verif-idris` → type-checks all 11 modules (incl. DispatchCorrectness), **exit 0**. - **Negative test**: injecting `n = S n` proved by `Refl` into a module makes the ipkg typecheck **exit 1** — the gate actually gates. - Corpus is **axiom-clean**: no `believe_me` / `assert_total` / `postulate` (the two grep hits are string literals in a `patternString` data table, not real escapes); fully total. ## Notes - No proof sources changed — packaging + CI wiring only. - The `verif-idris` job mirrors `idris2-abi-ci`'s install (apt-first, source-fallback) and uses `timeout-minutes: 30` to cover the source-bootstrap fallback. - Remaining in the proof-CI sweep: a **weekly cron** for the heavier Isabelle/Mizar corpora (next). 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- _Generated by [Claude Code](https://claude.ai/code/session_01UAqDQaMwpUqWHUSZekGZWv)_ --------- Co-authored-by: Claude <noreply@anthropic.com>

#277) ## What Adds a **weekly cron** gating the two remaining ungated self-proof corpora — **Isabelle/HOL** (`proofs/isabelle`, 5 `.thy`) and **Mizar** (`proofs/mizar`, 4 `.miz`). Completes the proof-CI sweep after #275 (Lean) and #276 (Idris2). - **`.github/workflows/verification-proofs-cron.yml`** — `isabelle` + `mizar` jobs on a weekly schedule (`Mon 04:17 UTC`) + `workflow_dispatch`. Isabelle reuses `live-provers.yml`'s proven Isabelle2024 install; Mizar discovers the newest linux tarball from `mizar.org` at runtime. - **`proofs/isabelle/ROOT`** — two sessions: `Echidna_Isabelle` (base HOL: Basic, List, Nat, Propositional) and `Echidna_Isabelle_Algebra` (GroupTheory, on HOL-Algebra). - **`just proofs-isabelle`** (`isabelle build`) and **`just proofs-mizar`** (`makeenv` + `verifier` per file, failing on a non-empty `.err` since the verifier's exit code is unreliable). Kept **out** of the per-PR `proofs` rollup — these toolchains are too heavy for per-PR. ## Why a weekly cron (not per-PR) Both toolchains are large, non-apt downloads (Isabelle ~500MB + the HOL-Algebra image build; Mizar's system + full MML). Mizar is additionally **Tier-4 "mock-only"** in this project. Gating them per-PR would bloat every PR; a weekly schedule (mirroring `container-ci.yml`) is the right cadence. ## Verification status — please read `isabelle.in.tum.de` **and** `mizar.org` both return `403 host_not_allowed` from the dev sandbox, and neither tool is in apt or has a GitHub mirror — so these recipes **cannot be run locally here**. They are being validated by **dispatching this workflow on GitHub's runners** (where `live-provers.yml` already downloads Isabelle successfully) and iterating on the real run output. I will report the dispatch results on this PR. Known risk areas the first runs will confirm: - Isabelle theory-name vs HOL-name handling (`List.thy`/`Nat.thy` share names with HOL theories). - The exact Mizar archive layout / `MIZFILES` + `makeenv`/`verifier` invocation. Holding as **draft** until the dispatched runs are green. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- _Generated by [Claude Code](https://claude.ai/code/session_01UAqDQaMwpUqWHUSZekGZWv)_ --------- Co-authored-by: Claude <noreply@anthropic.com>

…cron (#278) ## What Repairs the broken Isabelle/Mizar self-proof corpora so the weekly `verification-proofs-cron` (added in #277) can actually go green. Follows the decision to **repair** rather than gate-the-subset. This PR (first repair round): **Isabelle Nat & List**. - Renamed `Nat.thy` → `EchidnaNat.thy`, `List.thy` → `EchidnaList.thy`. A theory named `Nat`/`List` collides with HOL's own theories of those names — the CI run reported `Duplicate fact declaration "Nat.le_refl"` and `"List.append_assoc"`. - Annotated the three Nat addition lemmas (`0+n=n`, `n+0=n`, `m+n=n+m`) `:: nat`; as stated they were polymorphic over a bare `{zero,plus}` type that `simp` can't discharge. All other proofs are unchanged (they passed already). - Updated `proofs/isabelle/ROOT` to the new theory names. ## Status / still to do (tracked, will push to this PR) - **GroupTheory.thy**: still excluded — it has genuine math errors (lemmas omit `assumes "group G"` then use `assms`; `center_closed` uses `group.m_comm`, which needs `comm_group`). Repair next. - **Mizar**: the install now fully works (resolves + downloads `mizar-8.1.15`, binaries on PATH). `makeenv basic.miz` reports 1 error; the recipe is currently **diagnostic** (dumps each `.err` + offending source line) so this run reveals all four files' real errors. Repair follows once diagnosed. ## Verification note `isabelle.in.tum.de` / `mizar.org` are `403`-blocked from the dev sandbox, so these can't be run locally — they're verified by this PR's `verification-proofs-cron` run on GitHub's runners (Isabelle2025-2), iterating on real output. Lean/Idris2/Agda/Coq gates are already merged (#275/#276/#234). 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- _Generated by [Claude Code](https://claude.ai/code/session_01UAqDQaMwpUqWHUSZekGZWv)_ --------- Co-authored-by: Claude <noreply@anthropic.com>

claude added 13 commits June 5, 2026 00:45

ci(proofs): bisect — restore Coq job only

7f95385

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ci(proofs): bisect — isolate taiki install-action step

12321b3

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ci(proofs): bisect — curl-install just instead of marketplace action

eac87fc

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

merge: bring loving-cannon up to origin/main (absorb merged #256-#274)

5710743

hyperpolymath marked this pull request as ready for review June 21, 2026 14:28

hyperpolymath merged commit 969b73e into main Jun 21, 2026
51 of 53 checks passed

hyperpolymath deleted the claude/loving-cannon-AtUSm branch June 21, 2026 14:28

hyperpolymath mentioned this pull request Jun 21, 2026

ci(proofs): gate the Idris2 verification corpus (verification/proofs/idris2) #276

Merged

hyperpolymath mentioned this pull request Jun 21, 2026

ci(proofs): weekly cron gating the Isabelle + Mizar self-proof corpora #277

Merged

hyperpolymath mentioned this pull request Jun 21, 2026

fix(proofs): repair Isabelle/Mizar self-proof corpora for the weekly cron #278

Merged

hyperpolymath mentioned this pull request Jun 21, 2026

Repair + gate the Isabelle GroupTheory and Mizar self-proof corpora #280

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci(proofs): gate the Lean 4 verification corpus (verification/proofs/lean4)#275

ci(proofs): gate the Lean 4 verification corpus (verification/proofs/lean4)#275
hyperpolymath merged 13 commits into
mainfrom
claude/loving-cannon-AtUSm

hyperpolymath commented Jun 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hyperpolymath commented Jun 21, 2026

What

Why

Verification (run locally, not eyeballed)

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants