Skip to content

ci(proofs): gate the Lean 4 verification corpus (verification/proofs/lean4)#275

Merged
hyperpolymath merged 13 commits into
mainfrom
claude/loving-cannon-AtUSm
Jun 21, 2026
Merged

ci(proofs): gate the Lean 4 verification corpus (verification/proofs/lean4)#275
hyperpolymath merged 13 commits into
mainfrom
claude/loving-cannon-AtUSm

Conversation

@hyperpolymath

Copy link
Copy Markdown
Owner

What

Adds CI coverage for the Lean 4 trust-pipeline property proofs under
verification/proofs/lean4/ — they previously fell through every proof-CI filter.

  • New verif-lean job in dogfood-proofs-ci.yml, a structural clone of the existing
    lean job (elan shim + pinned-toolchain pre-install with retry/backoff, then just),
    reading the pin from verification/proofs/lean4/lean-toolchain and running a new
    just proofs-verif-lean recipe.
  • New Justfile recipe proofs-verif-lean = cd verification/proofs/lean4 && lake build,
    also added to the proofs roll-up so just proofs checks it locally too.
  • Path triggers extended to verification/proofs/lean4/** (both push and pull_request).

Why

dogfood-proofs-ci gated only proofs/{coq,lean,agda} + src/idris; the
idris2-abi-ci and agda-meta-checker workflows are path-filtered to src/abi/**
and meta-checker/**. So the five property proofs here
(ConfidenceLattice, ParetoMaximality, ParetoStrongMaximality,
IntegrityVerification, PortfolioCompleteness) had no gate — regressions would
have landed silently.

Verification (run locally, not eyeballed)

On Lean 4.13.0 (installed via the project's pinned tarball, the same version the
corpus pins):

  • Fresh lake build with .lake/build cleared first → all 5 modules compile,
    Build completed successfully, exit 0 (~24 s).
  • just proofs-verif-lean → exit 0.
  • grep -nE 'sorry|admit|^\s*axiom |native_decide' across the corpus → none
    these are genuine, hole-free proofs.

No mathlib dependency (lake-manifest packages: []), so the job builds well under the
20-minute timeout.

Notes

  • No source/proof files changed — CI wiring only.
  • Next in the same sweep (separate PRs): gate verification/proofs/idris2 (Idris2,
    idris2 --check per file), and a weekly cron for the heavier Isabelle/Mizar corpora.

🤖 Generated with Claude Code


Generated by Claude Code

claude added 13 commits June 5, 2026 00:45
Two pre-existing failures left red on main after #220 merged:

- agda-meta-checker.yml: `cabal install Agda-2.7.0.1` aborts on a warm
  ~/.cabal cache with "Path '.../bin/agda' already exists"; add
  --overwrite-policy=always so the install is idempotent. Until this lands
  the Agda meta-checker never actually runs in CI.
- contractile.just defined a `heal` recipe that collides with the root
  Justfile's `heal` across `import?`, so `just` aborts with a
  duplicate-recipe error — breaking every `just` call incl. MVP Smoke's
  `just mvp-env`. Drop the contractile copy; keep the richer root recipe
  (and contractile's heal-provers).

https://claude.ai/code/session_01UAqDQaMwpUqWHUSZekGZWv
ProverKindInjectivity.idr:
- Replace all 107 LTESucc towers with lteLit calls (eliminates off-by-one
  bug class permanently; Nunchaku was -1 close and Faial was +1)
- Add `import NatLte` for the machine-checked LTE helper
- Fix docstring range [0, 104] -> [0, 106] to match actual max discriminant

DispatchCorrectness.idr:
- Remove `import EchidnaABI.Provers` (pulls in a conflicting `proverLogic`
  from InteractiveAssistants; file only needs EchidnaABI.Types)
- Allows file to type-check with the echidnaabi package installed

proofs/agda/Basic.agda:
- Hoist `data ℕ` and `data _≡_` to module scope (were only in `where`
  blocks, causing scope errors when used in type signatures)
- Remove duplicate `data _≡_` from CombinatorLaws/CurryUncurryLaws
  (they now inherit from module scope)
- Instantiate SKK's second const with {B = A} to resolve unsolved meta

All 11 verification/proofs/idris2/ files now pass `idris2 --check`.
12/13 proofs/agda/ files pass (DispatchOrdering needs stdlib 2.1).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Coq:
- proofs/coq/nat.v: fix Nat.lt_irrefl/Nat.lt_trans usage for stdlib compat
- proofs/coq/list.v: fix in_filter backward direction — case-split on (f h)
  first so congruence can close the x=h/f h=false/f x=true contradiction
- proofs/coq/algebra/GroupTheory.v: complete rewrite; make G implicit in
  GroupAxioms Record so projections unify; remove Section (explicit params
  per theorem); fix rewrite direction in left_cancel (forward) vs
  right_cancel (backward); fix inv_mul associativity rewrite chain

Lean4:
- proofs/lean/Nat.lean, List.lean, Propositional.lean: fix imports and
  theorem statements for Lean 4.x stdlib API
- proofs/lean/algebra/GroupTheory.lean: port group theory proofs to Lean4
- proofs/lean/ECHIDNA.lean: add root module required by lakefile roots

Agda:
- proofs/agda/DispatchOrdering.agda: replace z<s/s<s (not in installed
  stdlib) with s≤s z≤n; Fin._<_ = ℕ._<_ on toℕ = suc · ≤ · suc

Idris2:
- src/idris/Validator.idr: rename bound var `proof` → `pterm` throughout;
  `proof` is a reserved keyword in Idris2 0.8.0 (Idris1 tactic holdover)

Build:
- .gitignore: add proofs/agda/**/dist-newstyle/, proofs/lean/**/.lake/,
  .lia.cache, .nia.cache
- proofs/lean/lake-manifest.json: add empty Lake lock file (no ext deps)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…idator)

The proof corpus had zero CI coverage: agda-meta-checker.yml only triggers
on meta-checker/**, idris2-abi-ci.yml only on src/abi/**, and there was no
Coq or Lean workflow at all. So proofs/{coq,lean,agda} and src/idris were
never type-checked in CI — a green PR proved nothing about the proofs and
regressions would land silently.

New dogfood-proofs-ci.yml with three jobs (Coq via apt, Lean 4 via elan,
Agda 2.7.0.1 + stdlib v2.1), each calling a just recipe so local and CI run
identical commands (RSR-H14). idris2-abi-ci.yml extended to also type-check
the src/idris validator (reuses its Idris2 install; adds src/idris/** to the
path trigger).

Closing the gap surfaced two latent bugs the absent CI had hidden:

- proofs/lean/lakefile.lean rooted only `ECHIDNA`, whose umbrella imported
  4 of 6 files; `lake build` silently skipped algebra/GroupTheory.lean and
  mvp_basic.lean, and in fact failed outright ("unknown module prefix
  'Basic'") because the top-level modules were not in the lib's module
  space. Now every proof file is a root and ECHIDNA.lean imports the whole
  corpus.

- src/idris/echidna-validator.ipkg declares an executable (main = Validator)
  but Validator.idr had no `main`, so `idris2 --build` failed with
  "Undefined name Validator.main". Added `main = testValidation`.

New Justfile recipes: proofs, proofs-coq, proofs-lean, proofs-agda,
proofs-idris (proofs-agda exports LC_ALL=C.UTF-8 — Agda aborts under a
non-UTF-8 locale).

Verified locally against real toolchains: coqc 8.18.0 (5/5), Lean v4.13.0
lake build (6 modules + umbrella), agda (13/13), idris2 0.8.0 --typecheck.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Re-triggers the workflow's first real run (the initial PR run hit a
startup_failure during workflow registration) and keeps the header
comments ASCII for parser portability. No behavioural change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Temporary: reduce to one trivial job to localise the GitHub-parser
rejection (passes yq + actionlint but startup_failures twice). Jobs
restored in the next commit once the skeleton is confirmed to start.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…actions

taiki-e/install-action reproducibly triggers a startup_failure in this
workflow's context (bisected: header + checkout + run steps start fine; adding
the action fails before any job runs, even though mvp-smoke.yml uses the same
pin on main). Switch all toolchain installs to plain run steps -- apt for Coq
and Agda, curl tarballs for elan and just -- matching the idris2/chapel
workflows, which use only actions/checkout plus run steps.

Agda now installs via apt (agda + agda-stdlib), which reproduces the local
verification environment exactly. just is installed from its release tarball.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Agda job failed with "Library 'standard-library' not found / Installed
libraries: (none)": apt-get --fix-missing silently skips agda-stdlib when it
can't be fetched, exiting 0, so the registration step found no .agda-lib and
wrote an empty ~/.agda/libraries.

Install only the agda binary via apt (2.6.3 on ubuntu-24.04) and fetch the
matching stdlib v1.7.3 from GitHub, registering its explicit path -- the same
deterministic pattern agda-meta-checker.yml uses. Verified locally: all 13
corpus files type-check against a fresh GitHub v1.7.3 in an isolated AGDA_DIR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The trust-pipeline property proofs under verification/proofs/lean4
(ConfidenceLattice, ParetoMaximality, ParetoStrongMaximality,
IntegrityVerification, PortfolioCompleteness) had no CI coverage:
dogfood-proofs-ci gated only proofs/{coq,lean,agda} + src/idris, and the
idris2-abi / agda-meta-checker workflows are path-filtered to src/abi/**
and meta-checker/** respectively, so this corpus fell through every
filter.

Add a `verif-lean` job mirroring the existing `lean` job (elan shim +
pinned-toolchain pre-install with retry/backoff, then `just`), reading
the pin from verification/proofs/lean4/lean-toolchain and running a new
`just proofs-verif-lean` recipe (cd verification/proofs/lean4 && lake
build). No mathlib dependency (lake-manifest packages: []), so it builds
well under the 20-minute timeout. Extend the push/PR path triggers to
verification/proofs/lean4/**.

Verified locally on Lean 4.13.0: a fresh `lake build` (cleared
.lake/build first) compiles all five modules, exit 0, with zero
sorry/admit/axiom/native_decide across the corpus.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01UAqDQaMwpUqWHUSZekGZWv
@hyperpolymath hyperpolymath marked this pull request as ready for review June 21, 2026 14:28
@hyperpolymath hyperpolymath merged commit 969b73e into main Jun 21, 2026
51 of 53 checks passed
@hyperpolymath hyperpolymath deleted the claude/loving-cannon-AtUSm branch June 21, 2026 14:28
hyperpolymath added a commit that referenced this pull request Jun 21, 2026
…idris2) (#276)

## What

Adds CI coverage for the **Idris2 trust-pipeline property proofs** under
`verification/proofs/idris2/` (11 modules) — the last self-proof corpus
with no gate.
Follows #275 (which gated `verification/proofs/lean4`).

- **New `echidna-verif-proofs.ipkg`** listing all 11 modules with
`depends = base, echidnaabi`.
- **New Justfile recipe `proofs-verif-idris`**: installs the
`echidnaabi` package (`--install`
builds it on the way), then `idris2 --typecheck
echidna-verif-proofs.ipkg`. Added to the
  `proofs` roll-up.
- **New `verif-idris` job** in `dogfood-proofs-ci.yml` (apt idris2 +
source-bootstrap fallback,
mirroring `idris2-abi-ci.yml`) + `verification/proofs/idris2/**` path
triggers.

## Why an ipkg, not a per-file `idris2 --check` loop

Two real problems surfaced while building this (caught by *running* it,
not eyeballing):

1. **`idris2 --check` returns exit 0 even when a module fails to resolve
an import.**
   Verified: a bare `idris2 --check DispatchCorrectness.idr` prints
`Error: Module EchidnaABI.Types not found` but **exits 0** — a per-file
loop would pass CI
while broken. (Genuine *type* errors do exit 1; only module-resolution
errors don't.)
2. **`DispatchCorrectness.idr` imports `EchidnaABI.Types`** — the
`echidnaabi` package in
   `src/abi/`, unresolvable standalone.

Building through the ipkg fixes both: `depends = echidnaabi` resolves
the import, and the
package build gives a reliable aggregate exit code. The recipe
additionally fails on any
`^Error` line as a belt-and-suspenders guard.

## Verification (run, not eyeballed)

On **idris2 0.8.0**, from a clean state (`src/abi/build`, the installed
`echidnaabi` package,
and `verification/proofs/idris2/build` all removed first):

- `just proofs-verif-idris` → type-checks all 11 modules (incl.
DispatchCorrectness), **exit 0**.
- **Negative test**: injecting `n = S n` proved by `Refl` into a module
makes the ipkg
  typecheck **exit 1** — the gate actually gates.
- Corpus is **axiom-clean**: no `believe_me` / `assert_total` /
`postulate` (the two grep hits
are string literals in a `patternString` data table, not real escapes);
fully total.

## Notes

- No proof sources changed — packaging + CI wiring only.
- The `verif-idris` job mirrors `idris2-abi-ci`'s install (apt-first,
source-fallback) and uses
  `timeout-minutes: 30` to cover the source-bootstrap fallback.
- Remaining in the proof-CI sweep: a **weekly cron** for the heavier
Isabelle/Mizar corpora
  (next).

🤖 Generated with [Claude Code](https://claude.com/claude-code)


---
_Generated by [Claude
Code](https://claude.ai/code/session_01UAqDQaMwpUqWHUSZekGZWv)_

---------

Co-authored-by: Claude <noreply@anthropic.com>
hyperpolymath added a commit that referenced this pull request Jun 21, 2026
#277)

## What

Adds a **weekly cron** gating the two remaining ungated self-proof
corpora — **Isabelle/HOL**
(`proofs/isabelle`, 5 `.thy`) and **Mizar** (`proofs/mizar`, 4 `.miz`).
Completes the proof-CI
sweep after #275 (Lean) and #276 (Idris2).

- **`.github/workflows/verification-proofs-cron.yml`** — `isabelle` +
`mizar` jobs on a weekly
schedule (`Mon 04:17 UTC`) + `workflow_dispatch`. Isabelle reuses
`live-provers.yml`'s proven
Isabelle2024 install; Mizar discovers the newest linux tarball from
`mizar.org` at runtime.
- **`proofs/isabelle/ROOT`** — two sessions: `Echidna_Isabelle` (base
HOL: Basic, List, Nat,
Propositional) and `Echidna_Isabelle_Algebra` (GroupTheory, on
HOL-Algebra).
- **`just proofs-isabelle`** (`isabelle build`) and **`just
proofs-mizar`** (`makeenv` + `verifier`
per file, failing on a non-empty `.err` since the verifier's exit code
is unreliable). Kept
**out** of the per-PR `proofs` rollup — these toolchains are too heavy
for per-PR.

## Why a weekly cron (not per-PR)

Both toolchains are large, non-apt downloads (Isabelle ~500MB + the
HOL-Algebra image build;
Mizar's system + full MML). Mizar is additionally **Tier-4 "mock-only"**
in this project. Gating
them per-PR would bloat every PR; a weekly schedule (mirroring
`container-ci.yml`) is the right
cadence.

## Verification status — please read

`isabelle.in.tum.de` **and** `mizar.org` both return `403
host_not_allowed` from the dev sandbox,
and neither tool is in apt or has a GitHub mirror — so these recipes
**cannot be run locally
here**. They are being validated by **dispatching this workflow on
GitHub's runners** (where
`live-provers.yml` already downloads Isabelle successfully) and
iterating on the real run output.
I will report the dispatch results on this PR. Known risk areas the
first runs will confirm:

- Isabelle theory-name vs HOL-name handling (`List.thy`/`Nat.thy` share
names with HOL theories).
- The exact Mizar archive layout / `MIZFILES` + `makeenv`/`verifier`
invocation.

Holding as **draft** until the dispatched runs are green.

🤖 Generated with [Claude Code](https://claude.com/claude-code)


---
_Generated by [Claude
Code](https://claude.ai/code/session_01UAqDQaMwpUqWHUSZekGZWv)_

---------

Co-authored-by: Claude <noreply@anthropic.com>
hyperpolymath added a commit that referenced this pull request Jun 21, 2026
…cron (#278)

## What

Repairs the broken Isabelle/Mizar self-proof corpora so the weekly
`verification-proofs-cron` (added in #277) can actually go green.
Follows the
decision to **repair** rather than gate-the-subset.

This PR (first repair round): **Isabelle Nat & List**.

- Renamed `Nat.thy` → `EchidnaNat.thy`, `List.thy` → `EchidnaList.thy`.
A theory
named `Nat`/`List` collides with HOL's own theories of those names — the
CI run
reported `Duplicate fact declaration "Nat.le_refl"` and
`"List.append_assoc"`.
- Annotated the three Nat addition lemmas (`0+n=n`, `n+0=n`, `m+n=n+m`)
`:: nat`;
as stated they were polymorphic over a bare `{zero,plus}` type that
`simp` can't
  discharge. All other proofs are unchanged (they passed already).
- Updated `proofs/isabelle/ROOT` to the new theory names.

## Status / still to do (tracked, will push to this PR)

- **GroupTheory.thy**: still excluded — it has genuine math errors
(lemmas omit
`assumes "group G"` then use `assms`; `center_closed` uses
`group.m_comm`, which
  needs `comm_group`). Repair next.
- **Mizar**: the install now fully works (resolves + downloads
`mizar-8.1.15`,
binaries on PATH). `makeenv basic.miz` reports 1 error; the recipe is
currently
**diagnostic** (dumps each `.err` + offending source line) so this run
reveals
  all four files' real errors. Repair follows once diagnosed.

## Verification note

`isabelle.in.tum.de` / `mizar.org` are `403`-blocked from the dev
sandbox, so these
can't be run locally — they're verified by this PR's
`verification-proofs-cron` run
on GitHub's runners (Isabelle2025-2), iterating on real output.
Lean/Idris2/Agda/Coq
gates are already merged (#275/#276/#234).

🤖 Generated with [Claude Code](https://claude.com/claude-code)


---
_Generated by [Claude
Code](https://claude.ai/code/session_01UAqDQaMwpUqWHUSZekGZWv)_

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants