ci(proofs): gate the Idris2 verification corpus (verification/proofs/idris2)#276
Merged
Conversation
Two pre-existing failures left red on main after #220 merged: - agda-meta-checker.yml: `cabal install Agda-2.7.0.1` aborts on a warm ~/.cabal cache with "Path '.../bin/agda' already exists"; add --overwrite-policy=always so the install is idempotent. Until this lands the Agda meta-checker never actually runs in CI. - contractile.just defined a `heal` recipe that collides with the root Justfile's `heal` across `import?`, so `just` aborts with a duplicate-recipe error — breaking every `just` call incl. MVP Smoke's `just mvp-env`. Drop the contractile copy; keep the richer root recipe (and contractile's heal-provers). https://claude.ai/code/session_01UAqDQaMwpUqWHUSZekGZWv
ProverKindInjectivity.idr:
- Replace all 107 LTESucc towers with lteLit calls (eliminates off-by-one
bug class permanently; Nunchaku was -1 close and Faial was +1)
- Add `import NatLte` for the machine-checked LTE helper
- Fix docstring range [0, 104] -> [0, 106] to match actual max discriminant
DispatchCorrectness.idr:
- Remove `import EchidnaABI.Provers` (pulls in a conflicting `proverLogic`
from InteractiveAssistants; file only needs EchidnaABI.Types)
- Allows file to type-check with the echidnaabi package installed
proofs/agda/Basic.agda:
- Hoist `data ℕ` and `data _≡_` to module scope (were only in `where`
blocks, causing scope errors when used in type signatures)
- Remove duplicate `data _≡_` from CombinatorLaws/CurryUncurryLaws
(they now inherit from module scope)
- Instantiate SKK's second const with {B = A} to resolve unsolved meta
All 11 verification/proofs/idris2/ files now pass `idris2 --check`.
12/13 proofs/agda/ files pass (DispatchOrdering needs stdlib 2.1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Coq: - proofs/coq/nat.v: fix Nat.lt_irrefl/Nat.lt_trans usage for stdlib compat - proofs/coq/list.v: fix in_filter backward direction — case-split on (f h) first so congruence can close the x=h/f h=false/f x=true contradiction - proofs/coq/algebra/GroupTheory.v: complete rewrite; make G implicit in GroupAxioms Record so projections unify; remove Section (explicit params per theorem); fix rewrite direction in left_cancel (forward) vs right_cancel (backward); fix inv_mul associativity rewrite chain Lean4: - proofs/lean/Nat.lean, List.lean, Propositional.lean: fix imports and theorem statements for Lean 4.x stdlib API - proofs/lean/algebra/GroupTheory.lean: port group theory proofs to Lean4 - proofs/lean/ECHIDNA.lean: add root module required by lakefile roots Agda: - proofs/agda/DispatchOrdering.agda: replace z<s/s<s (not in installed stdlib) with s≤s z≤n; Fin._<_ = ℕ._<_ on toℕ = suc · ≤ · suc Idris2: - src/idris/Validator.idr: rename bound var `proof` → `pterm` throughout; `proof` is a reserved keyword in Idris2 0.8.0 (Idris1 tactic holdover) Build: - .gitignore: add proofs/agda/**/dist-newstyle/, proofs/lean/**/.lake/, .lia.cache, .nia.cache - proofs/lean/lake-manifest.json: add empty Lake lock file (no ext deps) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…idator)
The proof corpus had zero CI coverage: agda-meta-checker.yml only triggers
on meta-checker/**, idris2-abi-ci.yml only on src/abi/**, and there was no
Coq or Lean workflow at all. So proofs/{coq,lean,agda} and src/idris were
never type-checked in CI — a green PR proved nothing about the proofs and
regressions would land silently.
New dogfood-proofs-ci.yml with three jobs (Coq via apt, Lean 4 via elan,
Agda 2.7.0.1 + stdlib v2.1), each calling a just recipe so local and CI run
identical commands (RSR-H14). idris2-abi-ci.yml extended to also type-check
the src/idris validator (reuses its Idris2 install; adds src/idris/** to the
path trigger).
Closing the gap surfaced two latent bugs the absent CI had hidden:
- proofs/lean/lakefile.lean rooted only `ECHIDNA`, whose umbrella imported
4 of 6 files; `lake build` silently skipped algebra/GroupTheory.lean and
mvp_basic.lean, and in fact failed outright ("unknown module prefix
'Basic'") because the top-level modules were not in the lib's module
space. Now every proof file is a root and ECHIDNA.lean imports the whole
corpus.
- src/idris/echidna-validator.ipkg declares an executable (main = Validator)
but Validator.idr had no `main`, so `idris2 --build` failed with
"Undefined name Validator.main". Added `main = testValidation`.
New Justfile recipes: proofs, proofs-coq, proofs-lean, proofs-agda,
proofs-idris (proofs-agda exports LC_ALL=C.UTF-8 — Agda aborts under a
non-UTF-8 locale).
Verified locally against real toolchains: coqc 8.18.0 (5/5), Lean v4.13.0
lake build (6 modules + umbrella), agda (13/13), idris2 0.8.0 --typecheck.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-triggers the workflow's first real run (the initial PR run hit a startup_failure during workflow registration) and keeps the header comments ASCII for parser portability. No behavioural change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Temporary: reduce to one trivial job to localise the GitHub-parser rejection (passes yq + actionlint but startup_failures twice). Jobs restored in the next commit once the skeleton is confirmed to start. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…actions taiki-e/install-action reproducibly triggers a startup_failure in this workflow's context (bisected: header + checkout + run steps start fine; adding the action fails before any job runs, even though mvp-smoke.yml uses the same pin on main). Switch all toolchain installs to plain run steps -- apt for Coq and Agda, curl tarballs for elan and just -- matching the idris2/chapel workflows, which use only actions/checkout plus run steps. Agda now installs via apt (agda + agda-stdlib), which reproduces the local verification environment exactly. just is installed from its release tarball. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Agda job failed with "Library 'standard-library' not found / Installed libraries: (none)": apt-get --fix-missing silently skips agda-stdlib when it can't be fetched, exiting 0, so the registration step found no .agda-lib and wrote an empty ~/.agda/libraries. Install only the agda binary via apt (2.6.3 on ubuntu-24.04) and fetch the matching stdlib v1.7.3 from GitHub, registering its explicit path -- the same deterministic pattern agda-meta-checker.yml uses. Verified locally: all 13 corpus files type-check against a fresh GitHub v1.7.3 in an isolated AGDA_DIR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The trust-pipeline property proofs under verification/proofs/lean4
(ConfidenceLattice, ParetoMaximality, ParetoStrongMaximality,
IntegrityVerification, PortfolioCompleteness) had no CI coverage:
dogfood-proofs-ci gated only proofs/{coq,lean,agda} + src/idris, and the
idris2-abi / agda-meta-checker workflows are path-filtered to src/abi/**
and meta-checker/** respectively, so this corpus fell through every
filter.
Add a `verif-lean` job mirroring the existing `lean` job (elan shim +
pinned-toolchain pre-install with retry/backoff, then `just`), reading
the pin from verification/proofs/lean4/lean-toolchain and running a new
`just proofs-verif-lean` recipe (cd verification/proofs/lean4 && lake
build). No mathlib dependency (lake-manifest packages: []), so it builds
well under the 20-minute timeout. Extend the push/PR path triggers to
verification/proofs/lean4/**.
Verified locally on Lean 4.13.0: a fresh `lake build` (cleared
.lake/build first) compiles all five modules, exit 0, with zero
sorry/admit/axiom/native_decide across the corpus.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UAqDQaMwpUqWHUSZekGZWv
…idris2)
The 11 trust-pipeline property proofs under verification/proofs/idris2
(AxiomCompleteness, AxiomPolicyOrdering, BasicTotality, ClampTrustBounds,
DispatchCorrectness, DispatchOrdering, NatLte, ProofStateSerialisation,
ProverKindInjectivity, TrustKernelMonotonicity, TrustLevelSoundness) had no
CI gate: dogfood-proofs-ci covered proofs/{coq,lean,agda} + src/idris and
(just-added) verification/proofs/lean4; idris2-abi-ci is path-filtered to
src/abi/** + src/idris/**. So this corpus fell through every filter.
Add:
- echidna-verif-proofs.ipkg listing all 11 modules with `depends = base,
echidnaabi`. DispatchCorrectness imports EchidnaABI.Types (the echidnaabi
package in src/abi), unresolvable by a standalone `idris2 --check`.
- `just proofs-verif-idris`: installs echidnaabi (--install builds it), then
`idris2 --typecheck echidna-verif-proofs.ipkg`; added to the `proofs` rollup.
- a `verif-idris` job in dogfood-proofs-ci.yml (apt idris2 + source fallback,
mirroring idris2-abi-ci) and verification/proofs/idris2/** path triggers.
Why an ipkg rather than a per-file `idris2 --check` loop: `idris2 --check`
exits 0 even when a module fails to resolve an import (verified: a
"Module ... not found" error prints but the process returns exit 0), which
would let a broken corpus pass CI silently. Building through the ipkg gives a
reliable aggregate exit code; the recipe additionally fails on any "Error"
line as a belt-and-suspenders guard.
Verified locally on idris2 0.8.0 from a clean state (ABI build + installed
package both removed first): `just proofs-verif-idris` type-checks all 11
modules, exit 0. Negative test -- injecting `n = S n` proved by `Refl` into a
module makes the ipkg typecheck exit 1 -- confirms the gate actually gates.
Corpus is axiom-clean (no believe_me / assert_total / postulate; fully total).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UAqDQaMwpUqWHUSZekGZWv
# Conflicts: # .github/workflows/dogfood-proofs-ci.yml # Justfile
hyperpolymath
added a commit
that referenced
this pull request
Jun 21, 2026
#277) ## What Adds a **weekly cron** gating the two remaining ungated self-proof corpora — **Isabelle/HOL** (`proofs/isabelle`, 5 `.thy`) and **Mizar** (`proofs/mizar`, 4 `.miz`). Completes the proof-CI sweep after #275 (Lean) and #276 (Idris2). - **`.github/workflows/verification-proofs-cron.yml`** — `isabelle` + `mizar` jobs on a weekly schedule (`Mon 04:17 UTC`) + `workflow_dispatch`. Isabelle reuses `live-provers.yml`'s proven Isabelle2024 install; Mizar discovers the newest linux tarball from `mizar.org` at runtime. - **`proofs/isabelle/ROOT`** — two sessions: `Echidna_Isabelle` (base HOL: Basic, List, Nat, Propositional) and `Echidna_Isabelle_Algebra` (GroupTheory, on HOL-Algebra). - **`just proofs-isabelle`** (`isabelle build`) and **`just proofs-mizar`** (`makeenv` + `verifier` per file, failing on a non-empty `.err` since the verifier's exit code is unreliable). Kept **out** of the per-PR `proofs` rollup — these toolchains are too heavy for per-PR. ## Why a weekly cron (not per-PR) Both toolchains are large, non-apt downloads (Isabelle ~500MB + the HOL-Algebra image build; Mizar's system + full MML). Mizar is additionally **Tier-4 "mock-only"** in this project. Gating them per-PR would bloat every PR; a weekly schedule (mirroring `container-ci.yml`) is the right cadence. ## Verification status — please read `isabelle.in.tum.de` **and** `mizar.org` both return `403 host_not_allowed` from the dev sandbox, and neither tool is in apt or has a GitHub mirror — so these recipes **cannot be run locally here**. They are being validated by **dispatching this workflow on GitHub's runners** (where `live-provers.yml` already downloads Isabelle successfully) and iterating on the real run output. I will report the dispatch results on this PR. Known risk areas the first runs will confirm: - Isabelle theory-name vs HOL-name handling (`List.thy`/`Nat.thy` share names with HOL theories). - The exact Mizar archive layout / `MIZFILES` + `makeenv`/`verifier` invocation. Holding as **draft** until the dispatched runs are green. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- _Generated by [Claude Code](https://claude.ai/code/session_01UAqDQaMwpUqWHUSZekGZWv)_ --------- Co-authored-by: Claude <noreply@anthropic.com>
hyperpolymath
added a commit
that referenced
this pull request
Jun 21, 2026
…cron (#278) ## What Repairs the broken Isabelle/Mizar self-proof corpora so the weekly `verification-proofs-cron` (added in #277) can actually go green. Follows the decision to **repair** rather than gate-the-subset. This PR (first repair round): **Isabelle Nat & List**. - Renamed `Nat.thy` → `EchidnaNat.thy`, `List.thy` → `EchidnaList.thy`. A theory named `Nat`/`List` collides with HOL's own theories of those names — the CI run reported `Duplicate fact declaration "Nat.le_refl"` and `"List.append_assoc"`. - Annotated the three Nat addition lemmas (`0+n=n`, `n+0=n`, `m+n=n+m`) `:: nat`; as stated they were polymorphic over a bare `{zero,plus}` type that `simp` can't discharge. All other proofs are unchanged (they passed already). - Updated `proofs/isabelle/ROOT` to the new theory names. ## Status / still to do (tracked, will push to this PR) - **GroupTheory.thy**: still excluded — it has genuine math errors (lemmas omit `assumes "group G"` then use `assms`; `center_closed` uses `group.m_comm`, which needs `comm_group`). Repair next. - **Mizar**: the install now fully works (resolves + downloads `mizar-8.1.15`, binaries on PATH). `makeenv basic.miz` reports 1 error; the recipe is currently **diagnostic** (dumps each `.err` + offending source line) so this run reveals all four files' real errors. Repair follows once diagnosed. ## Verification note `isabelle.in.tum.de` / `mizar.org` are `403`-blocked from the dev sandbox, so these can't be run locally — they're verified by this PR's `verification-proofs-cron` run on GitHub's runners (Isabelle2025-2), iterating on real output. Lean/Idris2/Agda/Coq gates are already merged (#275/#276/#234). 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- _Generated by [Claude Code](https://claude.ai/code/session_01UAqDQaMwpUqWHUSZekGZWv)_ --------- Co-authored-by: Claude <noreply@anthropic.com>
This was referenced Jun 21, 2026
hyperpolymath
added a commit
that referenced
this pull request
Jun 21, 2026
…if-idris perms (#281) ## What Per the decision to **finish the Isabelle base only** and descope the rest: 1. **Isabelle base — final fixes** (from the run-#279 error list; should make the cron's Isabelle job green): - **EchidnaNat**: `mult_assoc`, `mult_distrib_left`, `mult_distrib_right` annotated `:: nat` (polymorphic over `{plus,times}`); `factorial_monotone` rewritten to derive `Suc k ≤ Suc k * factorial k` via `mult_le_mono2` + `factorial_positive` (the old calc step's `by simp` couldn't do multiplication-monotonicity). - **EchidnaList**: `list_all_filter` via `list_all_iff` + `filter_id_conv` (the `by (induction; auto)` left the `¬ P`/length case open). - Result: Basic, EchidnaList, EchidnaNat, Propositional verify on Isabelle2025-2. 2. **Descope Mizar** from the weekly cron — its `.miz` files have genuine, heterogeneous verification errors and it's the project's mock-only Tier-4 tail. Dropped the `mizar` job and `proofs/mizar/**` trigger; the `proofs-mizar` recipe stays for local use. **GroupTheory stays excluded.** Both tracked in #280 for dedicated, locally-tested repair. 3. **Fix `verif-idris` permission flakiness** (real bug from #276, now on main): when apt idris2 falls back to the source build, `sudo make install` leaves `~/.idris2` root-owned, so the recipe's unprivileged `idris2 --install echidnaabi.ipkg` failed with `Permission Denied`. `chown ~/.idris2` to the runner after install. ## Verification Isabelle can't run in the dev sandbox (`isabelle.in.tum.de` is `403`-blocked), so this is verified by the cron's Isabelle job on GitHub runners (Isabelle2025-2), iterating on the real error lists. The fixes here address the complete reported set from the prior run. Closes the proof-CI sweep for everything that verifies: Coq, Lean 4, Agda, Idris2 (+verification), Isabelle base. GroupTheory + Mizar → #280. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --- _Generated by [Claude Code](https://claude.ai/code/session_01UAqDQaMwpUqWHUSZekGZWv)_ --------- Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds CI coverage for the Idris2 trust-pipeline property proofs under
verification/proofs/idris2/(11 modules) — the last self-proof corpus with no gate.Follows #275 (which gated
verification/proofs/lean4).echidna-verif-proofs.ipkglisting all 11 modules withdepends = base, echidnaabi.proofs-verif-idris: installs theechidnaabipackage (--installbuilds it on the way), then
idris2 --typecheck echidna-verif-proofs.ipkg. Added to theproofsroll-up.verif-idrisjob indogfood-proofs-ci.yml(apt idris2 + source-bootstrap fallback,mirroring
idris2-abi-ci.yml) +verification/proofs/idris2/**path triggers.Why an ipkg, not a per-file
idris2 --checkloopTwo real problems surfaced while building this (caught by running it, not eyeballing):
idris2 --checkreturns exit 0 even when a module fails to resolve an import.Verified: a bare
idris2 --check DispatchCorrectness.idrprintsError: Module EchidnaABI.Types not foundbut exits 0 — a per-file loop would pass CIwhile broken. (Genuine type errors do exit 1; only module-resolution errors don't.)
DispatchCorrectness.idrimportsEchidnaABI.Types— theechidnaabipackage insrc/abi/, unresolvable standalone.Building through the ipkg fixes both:
depends = echidnaabiresolves the import, and thepackage build gives a reliable aggregate exit code. The recipe additionally fails on any
^Errorline as a belt-and-suspenders guard.Verification (run, not eyeballed)
On idris2 0.8.0, from a clean state (
src/abi/build, the installedechidnaabipackage,and
verification/proofs/idris2/buildall removed first):just proofs-verif-idris→ type-checks all 11 modules (incl. DispatchCorrectness), exit 0.n = S nproved byReflinto a module makes the ipkgtypecheck exit 1 — the gate actually gates.
believe_me/assert_total/postulate(the two grep hitsare string literals in a
patternStringdata table, not real escapes); fully total.Notes
verif-idrisjob mirrorsidris2-abi-ci's install (apt-first, source-fallback) and usestimeout-minutes: 30to cover the source-bootstrap fallback.(next).
🤖 Generated with Claude Code
Generated by Claude Code