Skip to content

Zangir/AlphaEvo

Repository files navigation

AlphaEvo

LLM-Seeded Evolutionary Discovery of Robust Quantitative Trading Signals

Python 3.11+ License: MIT NeurIPS 2026 CI Groq Free


AlphaEvo finds short-term stock return predictors — alpha factors — by having an LLM propose formulas that an evolutionary optimizer then breeds and stress-tests across multiple time windows. The LLM never sees returns; it only shapes the search space. The optimizer decides what survives.

Market data  ──►  Stage I: LLM seeds      ──►  Stage II: GP search     ──►  Stage III: EoH refine
                  regime-aware retrieval        20 generations                8 generations
                  10 candidate formulas         multi-window IC fitness       5 evolution operators
                  in safe expression DSL        train/test split              per-window IC feedback
                                                                              decorrelation filter
                                                                         ──►  Deployment bundle

Three design decisions that matter: (1) seeding adapts to the current market regime, not a fixed library; (2) every candidate is scored on four non-overlapping sub-windows, so overfitting to one period does not survive; (3) EoH operators mutate structure guided by which time windows are weakest — a feedback loop that deterministic GP cannot replicate.


Results

Per-asset IC benchmark — 120 instances (12 symbols × 10 quarters, Q1 2022 – Q2 2024)

Method IC (test) IC gap Sharpe Description
Buy & Hold 0.65 Passive benchmark
RSI Rule 0.05 ± 0.15 0.45 Standard momentum rule
Random GP 0.31 ± 0.09 0.10 0.58 GP without LLM seeding
LLM-Only 0.01 ± 0.15 0.41 LLM formula, no evolution
GP + Anchors 0.15 ± 0.10 0.12 0.56 GP with fixed seed library
AlphaEvo (ours) 0.52 ± 0.08 0.09 4.78 LLM seed + GP + EoH

IC gap = train IC − test IC. Lower is better. Paired t-test vs. Random GP: p < 0.01, n = 120.

Large-scale cross-sectional validation — 503 S&P 500 stocks, Jan 2010 – Dec 2024

Method Daily IC (mean) ICIR L/S Sharpe L/S MDD Hit rate
RSI Rule 0.008 0.14 0.38 −28.7% 51.9%
Random GP 0.031 0.49 1.24 −18.4% 54.2%
LLM-Only 0.019 0.31 0.72 −24.1% 52.5%
GP + Anchors 0.044 0.68 1.71 −14.3% 56.8%
AlphaEvo (ours) 0.052 0.82 1.89 −13.1% 58.7%
SPY (passive) 0.68 −33.9%

L/S = equal-weighted long top decile, short bottom decile, daily rebalance. 3,780 trading days.

Ablation — what each component contributes

Configuration IC (test) Sharpe IC gap
AlphaEvo full 0.524 4.78 0.092
− EoH (GP only, same budget) 0.483 4.42 0.112
− LLM seeding (anchors only) 0.422 4.04 0.096
− multi-window fitness 0.507 4.65 0.144
− decorrelation filter 0.602 4.61 0.285

Removing decorrelation raises IC but more than triples IC gap — the filter is essential for robustness.

LLM provider sensitivity

All three free-tier providers achieve within 3% of each other. No paid API is required.

Provider Model Final IC Seed IC Lift
Groq (default) llama-3.1-8b-instant 0.521 0.336 +0.185
Groq llama-3.3-70b-versatile 0.535 0.293 +0.243
Google gemini-2.0-flash 0.528 0.294 +0.234
None deterministic anchors 0.452 0.260 +0.192

Quick start

git clone https://github.com/Zangir/AlphaEvo.git
cd alphaevo
python3.11 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

No API key needed — reproduce all paper experiments deterministically:

python scripts/run_paper_experiments.py --experiments E1 --out-dir paper/results/generated

One instance takes ~10 seconds on a laptop. The full 120-instance E1 run takes ~4 CPU hours.

With a free Groq key (500k tokens/day — enough for dozens of sessions):

cp .env.example .env          # add GROQ_API_KEY=your_key
python scripts/run_paper_experiments.py --experiments E1,E2,E3 --out-dir paper/results/generated

Regenerate all paper figures:

python scripts/generate_paper_figures.py \
  --results-dir paper/results \
  --out-dir paper/figures/generated

Use the pipeline in your own code

from alphaevo import AlphaEvoPipeline
from trading.data import fetch_historical, row_to_market_data

# Fetch 6 months of AAPL data (free, no API key)
df = fetch_historical("AAPL", "2024-01-01", "2024-06-30")
history = [row_to_market_data("AAPL", row, ts) for ts, row in df.iterrows()]

# Deterministic mode — no LLM calls, fully reproducible
pipeline = AlphaEvoPipeline(use_llm=False, use_eoh=True)
result = pipeline.run_cycle(history, symbol="AAPL", position_state="flat")

print(result.selected_alpha.formula)
# e.g.: ts_zscore_scale(cwise_mul(ts_delta(close, 3), ts_zscore_scale(volume_ratio_10, 10)), 15)
print(f"Test IC: {result.selected_alpha.metrics.test_ic:.3f}")
print(f"Sharpe:  {result.selected_alpha.metrics.sharpe_ratio:.2f}")

The PipelineResult exposes the full search trace: every candidate evaluated, the GP search history, EoH iteration summaries, the analyst review, and a deployment bundle — all as Pydantic models.


Repository layout

alphaevo/
├── alphaevo/
│   ├── dsl.py           # Safe expression DSL — 44 operators, sandboxed eval
│   ├── evaluator.py     # IC / Sharpe / fitness computation
│   ├── catalog.py       # Field + operator specs, seed examples
│   ├── search.py        # GP search engine (deterministic, seeded RNG)
│   ├── eoh.py           # EoH refinement (E1, E2, M1, M2, M3 operators)
│   ├── pipeline.py      # Full 3-stage LangGraph workflow
│   ├── knowledge.py     # Regime-aware knowledge compiler
│   ├── selection.py     # Qualified-alpha filtering + decorrelation
│   └── prompts/         # System + user prompt templates
├── evaluation/
│   └── metrics.py       # Sharpe, max drawdown, total return
├── trading/
│   └── data.py          # yfinance OHLCV + RSI / MA20 / MA50
├── scripts/
│   ├── run_paper_experiments.py      # Reproduce E1–E9 (Table 1, ablation, …)
│   ├── run_large_scale_experiment.py # Reproduce E10 (503 stocks, 15 years)
│   └── generate_paper_figures.py     # Regenerate all paper figures
├── data/
│   └── sp500_universe.csv   # 503-stock universe snapshot (as of 2024-12)
├── paper/
│   ├── results/             # Included paper artifact JSONs (E1–E10)
│   └── figures/             # Included paper figures (PDF)
├── tests/                   # Smoke tests — run with pytest
├── requirements.txt
├── pyproject.toml
└── .env.example

Extending AlphaEvo

Add a new DSL operator: define a function in alphaevo/dsl.py, add it to FUNCTION_ENV and the allowed-node whitelist, then add an OperatorSpec entry in alphaevo/catalog.py. The search engine and EoH optimizer pick it up automatically.

Add a new EoH evolution operator: subclass or extend EoHRefinementEngine in alphaevo/eoh.py. The operator receives the current population and should return a mutated formula string.

Use a different LLM provider: set LLM_PROVIDER in .env. All providers listed below work with the existing prompt templates.

Provider Free? LLM_PROVIDER= Key env var
Groq — Llama 3.1 8B (paper default) ✅ 500k tokens/day groq GROQ_API_KEY
Google Gemini — Gemini 2.0 Flash ✅ 1,500 req/day gemini GEMINI_API_KEY
OpenRouter — many free models ✅ free tier openrouter OPENROUTER_API_KEY
Ollama — runs locally, offline ✅ unlimited ollama none
OpenAI GPT-4o 💳 paid openai OPENAI_API_KEY
Anthropic Claude 💳 paid anthropic ANTHROPIC_API_KEY

Run the test suite:

pip install pytest
pytest tests/ -v

Reproducing all paper experiments

ID Description Command
E1 Per-asset IC benchmark (Table 1) --experiments E1
E2 Ablation study --experiments E2
E3 Regime analysis --experiments E3
E4 LLM provider sensitivity --experiments E4
E5 Operator analysis --experiments E5
E6 Diversity / efficiency curves --experiments E6
E7 Qualitative alpha examples --experiments E7
E8 Per-symbol breakdown --experiments E8
E9 IC gap distribution --experiments E9
E10 Large-scale cross-sectional python scripts/run_large_scale_experiment.py

All E1–E9 are deterministic (no LLM calls). E10 fetches ~503 tickers from yfinance; plan for ~2–4 hours on a 4-core machine.

Fresh reruns write to paper/results/generated/ — the included paper/results/*.json files are the fixed paper artifacts and should not be overwritten.


Citation

If you use AlphaEvo in your research, please cite:

@inproceedings{alphaevo2026,
  title     = {{AlphaEvo}: {LLM}-Seeded Evolutionary Discovery of Robust Quantitative Trading Signals},
  author    = {Machavariani, Temiko and Tsourekas, Kyriakos and Rotte, Matvei and Tsulaia, Luka and Tazhibaev, Iskhak and Nwadike, Munachiso Samuel and Lahlou, Salem and Jamwal, Prashant Kumar and Inui, Kentaro and Tak{\'a}{\v{c}}, Martin and Iklassov, Zangir},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2026},
  note      = {Preprint. Under review.}
}

License

Code: MIT. Market data sourced from Yahoo Finance under their terms of use. This repository is a research artifact — not financial advice. Past backtest performance does not guarantee future results.

About

LLM-Seeded Evolutionary Discovery of Robust Quantitative Trading Signals (NeurIPS 2026)

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages