An agentic system for solving IMO and Putnam-level math problems. It generates solutions, iteratively self-improves them, and verifies correctness - all powered by your choice of LLM backend (Cohere, OpenAI, Anthropic, Gemini).
- Solve - the solver LLM generates an initial solution to the problem
- Self-improve - the solver critiques and refines its own solution
- Verify - a verifier LLM checks the solution for correctness
- Retry - if verification fails, the full cycle repeats (up to
--max-runsattempts)
The solver and verifier can use different backends and models, letting you mix providers for better results.
-
Install uv:
curl -LsSf https://astral.sh/uv/install.sh | sh -
Install dependencies:
uv sync
-
Environment variables: Create a
.envfile with API keys for the backends you want to use:COHERE_API_KEY=your_cohere_key OPENAI_API_KEY=your_openai_key ANTHROPIC_API_KEY=your_anthropic_key GOOGLE_API_KEY=your_google_key
You only need keys for the backends you plan to use.
uv run imo-agent problem.txt --backend cohereOr via python main.py:
uv run python main.py problem.txt --backend cohere| Flag | Default | Description |
|---|---|---|
-b, --backend |
cohere |
LLM backend for solving: cohere, openai, anthropic, gemini |
-m, --model |
backend default | Model name for the solver |
--v-backend |
same as solver | LLM backend for verification |
--v-model |
same as solver | Model name for the verifier |
--max-runs |
10 |
Maximum number of full solve/verify attempts |
--output |
- | File path to save the final solution |
--solver-temp |
0.7 |
Sampling temperature for the solver |
--verifier-temp |
0.1 |
Sampling temperature for the verifier |
Use different models for solving and verification:
uv run imo-agent problem.txt -b cohere --v-model command-r-08-2024Mix providers - solve with OpenAI, verify with Anthropic:
uv run imo-agent problem.txt -b openai --v-backend anthropicSave the solution to a file:
uv run imo-agent problem.txt -b cohere --output solution.txtsrc/imo_math_agent/
cli.py - CLI entry point (Typer)
agent.py - Core agent loop: solve -> self-improve -> verify
config.py - Agent configuration (temperatures, etc.)
prompts.py - Prompt templates for solving and verification
prompting.py - Prompt construction utilities
verification.py - Solution verification logic
types.py - Shared type definitions
utils.py - General utilities
backends/
base.py - Abstract backend interface
cohere.py - Cohere backend
openai.py - OpenAI backend
anthropic.py - Anthropic backend
gemini.py - Google Gemini backend
registry.py - Backend discovery and instantiation