mostly-autoagent

Autonomous prediction market strategy discovery. You are a meta-agent that improves a trading strategy agent harness.

Your job is NOT to trade directly. Your job is to improve the harness in agent.py so the agent discovers better trading strategies on its own.

Directive

Build an agent that discovers profitable strategies for Kalshi KXHIGH weather prediction markets using only weather observation data and market price data.

The agent has access to:

Historical weather observations (Therminal API / local parquet files)
OHLCV market data (1-min candles with bracket definitions)
A backtester that simulates fills and computes PnL after fees
Python execution for custom analysis

The agent does NOT have access to any pre-built models, features, or existing trading strategies. It must discover edges from raw data.

Domain Context

Kalshi Weather Markets

KXHIGH markets: daily high temperature for 20 US cities
Brackets: 2°F wide, e.g., "56° to 57°" → YES wins if actual high is 56 or 57
Bracket alignment (even/odd) VARIES per event — never assume
Settlement: NWS CLI daily high, rounded to nearest integer °F
Boundaries: [floor, cap] inclusive on both sides
Tail brackets: "X° or below" (lower), "X° or above" (upper)

Fee Structure

Taker: ceil(7 * contracts * price * (1 - price)) / 100
Maker: ceil(1.75 * contracts * price * (1 - price)) / 100
Fees eat into edge — the strategy needs enough edge to overcome them

What a Good Strategy Looks Like

Identifies mispricings: market price ≠ true probability
Has an information advantage: better temperature forecast than the market
Manages risk: doesn't bet the farm on one bracket
Works across cities and seasons (not overfit to one event)

What You Can Modify

Everything above the FIXED ADAPTER BOUNDARY comment in agent.py:

SYSTEM_PROMPT — agent instructions and strategy guidance
CUSTOM_TOOLS — add analysis tools, data loading helpers
Tool implementations — improve data access, add features
Agent orchestration — multi-step reasoning, sub-agents
MODEL, MAX_TURNS, THINKING — agent configuration

What You Must Not Modify

The Harbor adapter and container entrypoint sections (below the boundary). The backtester.py settlement logic (this is ground truth).

Goal

Maximize score from the backtester (= net PnL after fees).

Secondary metrics:

win_rate — fraction of trades that are profitable
sharpe_annual — risk-adjusted returns
max_drawdown — worst peak-to-trough decline

How to Run

docker build -f Dockerfile.base -t autoagent-base .
rm -rf jobs; mkdir -p jobs && uv run harbor run -p tasks/ -n 4 --agent-import-path agent:AutoAgent -o jobs --job-name latest > run.log 2>&1

Logging Results

Log every experiment to results.tsv:

commit	score	win_rate	n_trades	sharpe	status	description

commit: short git hash
score: net PnL ($)
win_rate: fraction (0-1)
n_trades: total trades executed
sharpe: annualized Sharpe ratio
status: keep, discard, or crash
description: what changed

Experiment Loop

Read the latest run.log and task results
Analyze which cities/dates the strategy works on vs fails
Look for patterns: does it fail on certain weather types? Price ranges?
Choose one improvement to the harness:
- Better prompt engineering for the strategy agent
- New tools (e.g., forecast features, market microstructure analysis)
- Better signal generation logic
- Improved risk management
Edit agent.py
Commit, rebuild, rerun
Record results
Keep if score improved, discard if not

Strategy Ideas for the Meta-Agent

High-leverage improvements to try:

Temperature forecasting tools: Add tools that compute simple forecasts (e.g., persistence model, climatology, recent trend extrapolation)
Market inefficiency detection: Tools that compare market-implied probabilities against simple forecast models
Multi-bracket analysis: Look at the full bracket distribution, not just one bracket — find where markets are most mispriced
Timing: The agent can look at pre-prediction prices and post-prediction prices — earlier signals may get better fills
Weather feature engineering: Temperature lags, trends, seasonality, humidity, wind — what predicts the daily high?
Cross-city signals: Does NYC weather predict Philly? Do coastal cities behave differently from inland?

Keep / Discard Rules

If score improved → keep
If score same and simpler harness → keep
Otherwise → discard

Overfitting Rule

Do not add city-specific or date-specific hacks.

Test: "If these exact dates disappeared, would this still work?" If no, it's overfitting.

NEVER STOP

Once the experiment loop begins, do NOT stop to ask whether you should continue. Keep iterating until the human explicitly interrupts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mostly-autoagent

Directive

Domain Context

Kalshi Weather Markets

Fee Structure

What a Good Strategy Looks Like

What You Can Modify

What You Must Not Modify

Goal

How to Run

Logging Results

Experiment Loop

Strategy Ideas for the Meta-Agent

Keep / Discard Rules

Overfitting Rule

NEVER STOP

FilesExpand file tree

program.md

Latest commit

History

program.md

File metadata and controls

mostly-autoagent

Directive

Domain Context

Kalshi Weather Markets

Fee Structure

What a Good Strategy Looks Like

What You Can Modify

What You Must Not Modify

Goal

How to Run

Logging Results

Experiment Loop

Strategy Ideas for the Meta-Agent

Keep / Discard Rules

Overfitting Rule

NEVER STOP