AlphaEvo

LLM-Seeded Evolutionary Discovery of Robust Quantitative Trading Signals

AlphaEvo finds short-term stock return predictors — alpha factors — by having an LLM propose formulas that an evolutionary optimizer then breeds and stress-tests across multiple time windows. The LLM never sees returns; it only shapes the search space. The optimizer decides what survives.

Market data  ──►  Stage I: LLM seeds      ──►  Stage II: GP search     ──►  Stage III: EoH refine
                  regime-aware retrieval        20 generations                8 generations
                  10 candidate formulas         multi-window IC fitness       5 evolution operators
                  in safe expression DSL        train/test split              per-window IC feedback
                                                                              decorrelation filter
                                                                         ──►  Deployment bundle

Three design decisions that matter: (1) seeding adapts to the current market regime, not a fixed library; (2) every candidate is scored on four non-overlapping sub-windows, so overfitting to one period does not survive; (3) EoH operators mutate structure guided by which time windows are weakest — a feedback loop that deterministic GP cannot replicate.

Results

Per-asset IC benchmark — 120 instances (12 symbols × 10 quarters, Q1 2022 – Q2 2024)

Method	IC (test)	IC gap	Sharpe	Description
Buy & Hold	—	—	0.65	Passive benchmark
RSI Rule	0.05 ± 0.15	—	0.45	Standard momentum rule
Random GP	0.31 ± 0.09	0.10	0.58	GP without LLM seeding
LLM-Only	0.01 ± 0.15	—	0.41	LLM formula, no evolution
GP + Anchors	0.15 ± 0.10	0.12	0.56	GP with fixed seed library
AlphaEvo (ours)	0.52 ± 0.08	0.09	4.78	LLM seed + GP + EoH

IC gap = train IC − test IC. Lower is better. Paired t-test vs. Random GP: p < 0.01, n = 120.

Large-scale cross-sectional validation — 503 S&P 500 stocks, Jan 2010 – Dec 2024

Method	Daily IC (mean)	ICIR	L/S Sharpe	L/S MDD	Hit rate
RSI Rule	0.008	0.14	0.38	−28.7%	51.9%
Random GP	0.031	0.49	1.24	−18.4%	54.2%
LLM-Only	0.019	0.31	0.72	−24.1%	52.5%
GP + Anchors	0.044	0.68	1.71	−14.3%	56.8%
AlphaEvo (ours)	0.052	0.82	1.89	−13.1%	58.7%
SPY (passive)	—	—	0.68	−33.9%	—

L/S = equal-weighted long top decile, short bottom decile, daily rebalance. 3,780 trading days.

Ablation — what each component contributes

Configuration	IC (test)	Sharpe	IC gap
AlphaEvo full	0.524	4.78	0.092
− EoH (GP only, same budget)	0.483	4.42	0.112
− LLM seeding (anchors only)	0.422	4.04	0.096
− multi-window fitness	0.507	4.65	0.144
− decorrelation filter	0.602	4.61	0.285

Removing decorrelation raises IC but more than triples IC gap — the filter is essential for robustness.

LLM provider sensitivity

All three free-tier providers achieve within 3% of each other. No paid API is required.

Provider	Model	Final IC	Seed IC	Lift
Groq (default)	llama-3.1-8b-instant	0.521	0.336	+0.185
Groq	llama-3.3-70b-versatile	0.535	0.293	+0.243
Google	gemini-2.0-flash	0.528	0.294	+0.234
None	deterministic anchors	0.452	0.260	+0.192

Quick start

git clone https://github.com/Zangir/AlphaEvo.git
cd alphaevo
python3.11 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

No API key needed — reproduce all paper experiments deterministically:

python scripts/run_paper_experiments.py --experiments E1 --out-dir paper/results/generated

One instance takes ~10 seconds on a laptop. The full 120-instance E1 run takes ~4 CPU hours.

With a free Groq key (500k tokens/day — enough for dozens of sessions):

cp .env.example .env          # add GROQ_API_KEY=your_key
python scripts/run_paper_experiments.py --experiments E1,E2,E3 --out-dir paper/results/generated

Regenerate all paper figures:

python scripts/generate_paper_figures.py \
  --results-dir paper/results \
  --out-dir paper/figures/generated

Use the pipeline in your own code

from alphaevo import AlphaEvoPipeline
from trading.data import fetch_historical, row_to_market_data

# Fetch 6 months of AAPL data (free, no API key)
df = fetch_historical("AAPL", "2024-01-01", "2024-06-30")
history = [row_to_market_data("AAPL", row, ts) for ts, row in df.iterrows()]

# Deterministic mode — no LLM calls, fully reproducible
pipeline = AlphaEvoPipeline(use_llm=False, use_eoh=True)
result = pipeline.run_cycle(history, symbol="AAPL", position_state="flat")

print(result.selected_alpha.formula)
# e.g.: ts_zscore_scale(cwise_mul(ts_delta(close, 3), ts_zscore_scale(volume_ratio_10, 10)), 15)
print(f"Test IC: {result.selected_alpha.metrics.test_ic:.3f}")
print(f"Sharpe:  {result.selected_alpha.metrics.sharpe_ratio:.2f}")

The PipelineResult exposes the full search trace: every candidate evaluated, the GP search history, EoH iteration summaries, the analyst review, and a deployment bundle — all as Pydantic models.

Repository layout

alphaevo/
├── alphaevo/
│   ├── dsl.py           # Safe expression DSL — 44 operators, sandboxed eval
│   ├── evaluator.py     # IC / Sharpe / fitness computation
│   ├── catalog.py       # Field + operator specs, seed examples
│   ├── search.py        # GP search engine (deterministic, seeded RNG)
│   ├── eoh.py           # EoH refinement (E1, E2, M1, M2, M3 operators)
│   ├── pipeline.py      # Full 3-stage LangGraph workflow
│   ├── knowledge.py     # Regime-aware knowledge compiler
│   ├── selection.py     # Qualified-alpha filtering + decorrelation
│   └── prompts/         # System + user prompt templates
├── evaluation/
│   └── metrics.py       # Sharpe, max drawdown, total return
├── trading/
│   └── data.py          # yfinance OHLCV + RSI / MA20 / MA50
├── scripts/
│   ├── run_paper_experiments.py      # Reproduce E1–E9 (Table 1, ablation, …)
│   ├── run_large_scale_experiment.py # Reproduce E10 (503 stocks, 15 years)
│   └── generate_paper_figures.py     # Regenerate all paper figures
├── data/
│   └── sp500_universe.csv   # 503-stock universe snapshot (as of 2024-12)
├── paper/
│   ├── results/             # Included paper artifact JSONs (E1–E10)
│   └── figures/             # Included paper figures (PDF)
├── tests/                   # Smoke tests — run with pytest
├── requirements.txt
├── pyproject.toml
└── .env.example

Extending AlphaEvo

Add a new DSL operator: define a function in alphaevo/dsl.py, add it to FUNCTION_ENV and the allowed-node whitelist, then add an OperatorSpec entry in alphaevo/catalog.py. The search engine and EoH optimizer pick it up automatically.

Add a new EoH evolution operator: subclass or extend EoHRefinementEngine in alphaevo/eoh.py. The operator receives the current population and should return a mutated formula string.

Use a different LLM provider: set LLM_PROVIDER in .env. All providers listed below work with the existing prompt templates.

Provider	Free?	`LLM_PROVIDER=`	Key env var
Groq — Llama 3.1 8B (paper default)	✅ 500k tokens/day	`groq`	`GROQ_API_KEY`
Google Gemini — Gemini 2.0 Flash	✅ 1,500 req/day	`gemini`	`GEMINI_API_KEY`
OpenRouter — many free models	✅ free tier	`openrouter`	`OPENROUTER_API_KEY`
Ollama — runs locally, offline	✅ unlimited	`ollama`	none
OpenAI GPT-4o	💳 paid	`openai`	`OPENAI_API_KEY`
Anthropic Claude	💳 paid	`anthropic`	`ANTHROPIC_API_KEY`

Run the test suite:

pip install pytest
pytest tests/ -v

Reproducing all paper experiments

ID	Description	Command
E1	Per-asset IC benchmark (Table 1)	`--experiments E1`
E2	Ablation study	`--experiments E2`
E3	Regime analysis	`--experiments E3`
E4	LLM provider sensitivity	`--experiments E4`
E5	Operator analysis	`--experiments E5`
E6	Diversity / efficiency curves	`--experiments E6`
E7	Qualitative alpha examples	`--experiments E7`
E8	Per-symbol breakdown	`--experiments E8`
E9	IC gap distribution	`--experiments E9`
E10	Large-scale cross-sectional	`python scripts/run_large_scale_experiment.py`

All E1–E9 are deterministic (no LLM calls). E10 fetches ~503 tickers from yfinance; plan for ~2–4 hours on a 4-core machine.

Fresh reruns write to paper/results/generated/ — the included paper/results/*.json files are the fixed paper artifacts and should not be overwritten.

Citation

If you use AlphaEvo in your research, please cite:

@inproceedings{alphaevo2026,
  title     = {{AlphaEvo}: {LLM}-Seeded Evolutionary Discovery of Robust Quantitative Trading Signals},
  author    = {Machavariani, Temiko and Tsourekas, Kyriakos and Rotte, Matvei and Tsulaia, Luka and Tazhibaev, Iskhak and Nwadike, Munachiso Samuel and Lahlou, Salem and Jamwal, Prashant Kumar and Inui, Kentaro and Tak{\'a}{\v{c}}, Martin and Iklassov, Zangir},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2026},
  note      = {Preprint. Under review.}
}

License

Code: MIT. Market data sourced from Yahoo Finance under their terms of use. This repository is a research artifact — not financial advice. Past backtest performance does not guarantee future results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlphaEvo

LLM-Seeded Evolutionary Discovery of Robust Quantitative Trading Signals

Results

Per-asset IC benchmark — 120 instances (12 symbols × 10 quarters, Q1 2022 – Q2 2024)

Large-scale cross-sectional validation — 503 S&P 500 stocks, Jan 2010 – Dec 2024

Ablation — what each component contributes

LLM provider sensitivity

Quick start

Use the pipeline in your own code

Repository layout

Extending AlphaEvo

Reproducing all paper experiments

Citation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
alphaevo		alphaevo
data		data
evaluation		evaluation
paper		paper
scripts		scripts
tests		tests
trading		trading
.env.example		.env.example
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AlphaEvo

LLM-Seeded Evolutionary Discovery of Robust Quantitative Trading Signals

Results

Per-asset IC benchmark — 120 instances (12 symbols × 10 quarters, Q1 2022 – Q2 2024)

Large-scale cross-sectional validation — 503 S&P 500 stocks, Jan 2010 – Dec 2024

Ablation — what each component contributes

LLM provider sensitivity

Quick start

Use the pipeline in your own code

Repository layout

Extending AlphaEvo

Reproducing all paper experiments

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages