Production-ready Python implementation of the Free-MAD algorithm from the paper "Free-MAD: Consensus-Free Multi-Agent Debate".
freemad_ui.mp4
FREE-MAD ships two runtimes:
debate: the original consensus-free answer-selection runtimeautonomous: a persistent task runtime forplanandcodetasks with quorum review, resumable state, and task inspection surfaces
The autonomous runtime is intentionally a first milestone. It supports persisted tasks, staged execution, structured research provenance, policy-bound writes and local commands, human clarification feedback on resume, parallel execution for disjoint work items, CLI task commands, and background dashboard task execution with live event streaming.
debatemode: implemented and unchangedautonomousmode: implemented first milestone forplanandcodetasks
docs/autonomous-mode.md: contributor-facing overview of the shipped autonomous quorum runtime and its current limitsdocs/plans/2026-03-31-autonomous-quorum-runtime-spec.md: detailed runtime specification, updated to the implemented first milestonedocs/plans/2026-03-31-autonomous-quorum-runtime-implementation-plan.md: the test-first rollout plan used to land the initial implementation
Free-MAD is a revolutionary approach to multi-agent AI systems that eliminates the need for consensus among agents while achieving better accuracy and efficiency than traditional debate methods.
When you have multiple AI agents working on the same problem, traditional approaches (MAD - Multi-Agent Debate) work like this:
- Agents debate until they agree (reach consensus)
- The final answer is chosen by majority vote
This has serious problems:
- Conformity bias: Agents with the right answer get pressured by the majority into changing their minds (like peer pressure)
- High cost: Multiple debate rounds are needed to reach agreement
- Majority tyranny: The right answer can lose if fewer agents picked it—truth doesn't always win by popularity
Free-MAD takes a fundamentally different approach:
- No consensus required - Agents can disagree throughout the entire debate
- Score the journey, not just the destination - Instead of only looking at final votes, Free-MAD evaluates the quality of reasoning across ALL debate rounds
- Quality beats quantity - A single agent with strong reasoning can win, even if all others disagree
Think of it like judges scoring a debate competition: they don't wait to see who "wins" by convincing everyone else. Instead, they score the quality of each debater's arguments throughout the entire debate. The best-argued position wins, regardless of whether it convinced the majority.
The Algorithm:
- Round 0 (Generation): All agents independently propose solutions
- Round 1+ (Critique): Agents debate in two modes:
- Conformity mode: Present arguments supporting their answer
- Anti-conformity mode: Find flaws in other agents' answers
- Scoring: Track the entire debate trajectory and score based on:
- Quality of arguments
- Valid criticisms found
- How positions evolved over time
- Decision: Select the answer with the highest score (not the most votes)
Example:
Round 1:
Agent 1: Answer A (with strong reasoning)
Agent 2: Answer B
Agent 3: Answer B
Round 2:
Agent 1: Stays with A, points out flaws in B
Agent 2: Switches to A (convinced by Agent 1's arguments)
Agent 3: Stays with B
Traditional MAD: B wins (2 votes)
Free-MAD: A wins (higher score due to quality of reasoning)
This means a single agent with the right answer and strong reasoning can win, even if the majority disagrees—something impossible with traditional consensus-based approaches.
This section covers both shipped runtimes.
# With Poetry (recommended)
poetry install
poetry run freemad --version
# With pip
pip install -e .
freemad --version# Using YAML configuration
poetry run freemad "Write a function that returns Fibonacci(n)." \
--rounds 2 \
--config config_examples/multi_agent.yaml
# Using JSON configuration
poetry run freemad "Write a function that returns Fibonacci(n)." \
--rounds 2 \
--config config_examples/multi_agent.jsonBoth YAML and JSON formats are supported. See config_examples/multi_agent.yaml or config_examples/multi_agent.json for complete configuration examples.
Autonomous mode uses a persistent task store and role-aware agents. The minimal entry point is:
poetry run python -m freemad.cli task start \
--config path/to/autonomous-config.yaml \
--task-type plan \
--workspace-root "$PWD" \
"Critique this architecture until the agents approve an implementation-ready plan."Useful follow-up commands:
poetry run python -m freemad.cli task status <task_id> --config path/to/autonomous-config.yaml
poetry run python -m freemad.cli task inspect <task_id> --config path/to/autonomous-config.yaml
poetry run python -m freemad.cli task resume <task_id> --config path/to/autonomous-config.yaml
poetry run python -m freemad.cli task answer <task_id> "Use SQLite." --config path/to/autonomous-config.yaml
poetry run python -m freemad.cli task approve <task_id> plan_review --config path/to/autonomous-config.yaml
poetry run python -m freemad.cli task pause <task_id> --config path/to/autonomous-config.yamlThe first milestone currently supports:
plantasks that research, draft, review, arbitrate, and finalize planscodetasks that execute work items, run code review, run verification, and finalize
See docs/autonomous-mode.md for role requirements, persistence layout, dashboard routes, and current limitations.
Free-MAD is configured via YAML or JSON files. Here's a minimal example:
agents:
- id: claude-sonnet
type: claude_code
cli_command: "claude"
cli_args: {model: "sonnet"}
timeout: 600
- id: gpt-5
type: openai_codex
cli_command: "codex exec"
cli_args: {--model: "gpt-5.1"}
cli_flags: ["--skip-git-repo-check"]
cli_positional: ["-"]
timeout: 600
topology:
type: all_to_all # all agents review all others
seed: 427 # deterministic peer assignment
deadlines:
soft_timeout_ms: 15000 # quorum wait
hard_timeout_ms: 30000 # hard stop
min_agents: 2 # quorum size
scoring:
weights: [20.0, 25.0, 30.0, 20.0] # [initial, change-penalty, change-bonus, keep]
normalize: true # contributor-based normalization
tie_break: deterministic # or 'random'
security:
cli_allowed_commands: ["claude", "codex"]
cli_use_shell: false
max_requirement_size: 20000
max_solution_size: 400000
output:
save_transcript: true
transcript_dir: transcripts
format: jsonComplete configuration examples:
- YAML:
config_examples/multi_agent.yaml - JSON:
config_examples/multi_agent.json - All available options:
config_examples/ALL_KEYS.yaml
Define the AI agents participating in the debate:
id: Unique identifiertype: Adapter type (claude_code,openai_codex)cli_command: Command to invoke the agentcli_args: Key-value arguments passed to the CLIcli_flags: Boolean flags (e.g.,["--verbose"])cli_positional: Positional arguments (e.g.,["-"]for stdin)timeout: Per-call timeout in secondsconfig.temperature: Model temperature (0.0-1.0)config.max_tokens: Max output tokens (null = unlimited)roles: Optional autonomous-task roles such asresearcher,planner,reviewer,implementer,verifier,arbitercapabilities: Optional autonomous action kinds such asresearch,plan,review,implement,verify
Control how agents review each other's work:
all_to_all: Every agent reviews all others (full debate)k_reviewers: Each agent reviews k random peersring: Agents review in a circular patternstar: All agents review a central hub agent
Configure the Free-MAD scoring algorithm:
weights:[initial, change_penalty, change_bonus, keep]- Weights for different scoring componentsnormalize: Divide by contributor count to prevent score inflationtie_break:deterministic(first in list) orrandomrandom_seed: Seed for random tie-breaking
Control debate round timing:
soft_timeout_ms: Wait for quorum before proceedinghard_timeout_ms: Absolute deadline (accept late arrivals until this)min_agents: Quorum size at soft deadline
cli_allowed_commands: Whitelist of allowed executablescli_use_shell: Must befalsefor securitymax_requirement_size: Input size cap (chars)max_solution_size: Output size cap (chars)redact_patterns: Regex patterns to redact from logs
max_total_time_sec: Overall wall time budgetmax_round_time_sec: Per-round budgetmax_agent_time_sec: Per-agent call budgetmax_tokens_per_agent_per_round: Prompt truncation capenable_token_truncation: Allow prompt truncationmax_concurrent_agents: Parallelism limit
save_transcript: Persist debate transcripttranscript_dir: Output directoryformat:jsonormarkdownverbose: Print extra info during execution
enable_sandbox: Run solutions in restricted Python sandboxsandbox_timeout_ms: Sandbox execution limit
enabled: On-disk memoization of agent outputsdir: Cache directorymax_entries: Eviction limit
task.store_path: SQLite database path for task metadata and eventstask.artifacts_dir: Directory for task-scoped artifactstask.max_stage_retries: Retry count before arbitration or pausetask.max_total_iterations: Overall iteration cap for a tasktask.tool_policy.allow_web_research: Whether autonomous tasks may rely on agent-native research toolstask.tool_policy.allow_workspace_write: Whether autonomous tasks may write to the workspacetask.tool_policy.allowed_write_roots: Relative roots autonomous writes may touchtask.tool_policy.allow_local_commands: Whether autonomous tasks may run local commandstask.tool_policy.allowed_local_commands: Allowlist for task-run commandstask.tool_policy.verification_commands: Extra commands run during the verification stage
Free-MAD communicates with agents via stdin/stdout. Your agent CLI must:
- Accept mode as argument:
<cli_command> generateor<cli_command> critique - Read prompt from stdin: The debate requirement or critique instructions
- Output structured response:
SOLUTION:
<your proposed solution>
REASONING:
<your reasoning/arguments>
If your agent doesn't follow this contract, wrap it:
#!/usr/bin/env python3
import sys
import subprocess
mode = sys.argv[1] # 'generate' or 'critique'
prompt = sys.stdin.read()
# Call your actual agent
result = subprocess.run(
["your-agent-command", "--mode", mode],
input=prompt,
capture_output=True,
text=True
)
# Format output
print(f"SOLUTION:\n{result.stdout}")
print(f"\nREASONING:\nGenerated in {mode} mode")# Install dev dependencies
poetry install --with dev
# Run tests
poetry run pytest -q
# With coverage
poetry run pytest --cov=freemad --cov-report=term --cov-report=xmlmypy .poetry run pre-commit install
poetry run pre-commit run --all-filesSee AGENTS.md for detailed conventions:
- Immutable dataclasses
- StrEnums for constants
- No hard-coded strings internally
- Serialization at boundaries only
Debate transcripts capture the complete history for analysis:
{
"final_answer_id": "abc123...",
"final_solution": "def fibonacci(n): ...",
"scores": {
"abc123...": 85.5,
"def456...": 72.3
},
"winning_agents": ["claude-sonnet"],
"transcript": [
{
"round": 0,
"type": "generation",
"agents": {
"claude-sonnet": {
"response": { "solution": "...", "reasoning": "..." },
"peers_assigned": [],
"peers_seen": []
}
}
},
{
"round": 1,
"type": "critique",
"agents": { ... }
}
]
}Find transcripts in transcripts/ by default when output.save_transcript: true.
Free-MAD includes a web-based dashboard to visualize debate results. The dashboard reads JSON transcripts and displays the final answer, winning agents, and scores.
poetry run freemad-dashboard --dir transcripts --host 127.0.0.1 --port 8001Then open your browser to http://127.0.0.1:8001 to view the results.
Command Options:
--dir: Directory containing JSON transcripts (default:transcripts)--host: Server host address (default:127.0.0.1)--port: Server port (default:8001)
- ✅ View final debate results
- ✅ See winning agents and scores
- ✅ Browse all transcript files
The dashboard is actively being developed. Planned features include:
Autonomous Task Views:
- Persistent task pages distinct from debate transcripts
- Stage timeline for research, planning, execution, review, and verification
- Open-questions pane for human clarification and approvals
- Artifact browser for plans, patches, review notes, and verification logs
Real-Time Debate Visualization:
- Live conversation view showing agent-to-agent interactions
- Visual timeline of debate rounds
- See who said what in each round
Metrics & Analytics:
- Token usage tracking per agent and per round
- Time/duration metrics for each debate phase
- Cost estimation based on model pricing
Agent Information:
- Display model configurations (temperature, max_tokens)
- Show agent types and CLI commands used
- Topology visualization (peer assignment graphs)
Configuration UI:
- Configure agents through the web interface
- Edit debate parameters (rounds, weights, timeouts)
- Save and load configuration presets
Interactive Final Agent:
- Chat with a final orchestrator agent
- Execute the winning solution interactively
- Provide feedback and iterate on results
Enhanced UX:
- Make the system more user-friendly vs. command-line only
- Drag-and-drop configuration builder
- Real-time progress indicators
Contributions Welcome! If you'd like to help build these features, please see CONTRIBUTING.md or open an issue to discuss implementation ideas.
- Verify
cli_commandis in your PATH - Check
cli_commandis insecurity.cli_allowed_commands - Increase
agents[].timeoutif needed - Enable debug logging:
logging.level: DEBUG
- Agents must output exactly
SOLUTION:andREASONING:markers - Check transcript to see what agents actually produced
- Test your agent CLI manually with echo prompts
- Increase
deadlines.hard_timeout_ms - Increase
budget.max_round_time_sec - Ensure
deadlines.min_agents≤ number of enabled agents - Check
early_stop_reasonin transcript
- Set
topology.seedfor consistent peer assignments - Set
scoring.random_seedfor consistent tie-breaking - Use
scoring.tie_break: deterministic
- Issues: GitHub Issues
- Contributing: See CONTRIBUTING.md
- Code of Conduct: See CODE_OF_CONDUCT.md
- Security: See SECURITY.md for private vulnerability reporting
- Governance: See GOVERNANCE.md
If you use this implementation in your research, please cite:
@software{freemad2025,
author = {Santilli, Jonathan},
title = {FREE-MAD: Consensus-Free Multi-Agent Debate Implementation},
year = {2025},
url = {https://github.com/jonathansantilli/mad}
}And the original paper:
@article{freemad2024,
title={Free-MAD: Consensus-Free Multi-Agent Debate},
author={...},
journal={arXiv preprint arXiv:2509.11035},
year={2024}
}MIT License © 2025 Jonathan Santilli. See LICENSE for full text.
This project is independent and not affiliated with Anthropic, OpenAI, or any other vendor. "Claude", "Codex", and any other product names are trademarks of their respective owners and are used here only for identification.
This implementation is based on the paper:
"Free-MAD: Consensus-Free Multi-Agent Debate" arXiv:2509.11035v1 https://arxiv.org/html/2509.11035v1
- Eliminates consensus requirement: Agents can disagree throughout the debate
- Score-based decision mechanism: Evaluates entire debate trajectory, not just final votes
- Improved accuracy: Outperforms traditional MAD on reasoning benchmarks
- Better efficiency: Requires fewer debate rounds than consensus-based approaches
- Robustness: Resistant to conformity bias and communication attacks
The repository now ships the first autonomous milestone beside the original debate runtime.
Current autonomous properties:
- no single agent can declare a task complete by itself
- every stage requires at least a proposer and an independent checker
- disagreement triggers revision, arbitration, or human escalation instead of being hidden
- long-lived tasks can resume after restart with persisted state and artifacts
- research, planning, code changes, review, and verification are all first-class stages
Current supported workflows include:
- "Critique this architecture until the agents agree it is implementation-ready."
- "Implement this approved plan, have another agent review it, and only finish once verification passes."
- "Ask me a concrete question only when the goal is ambiguous or the next action is risky."
Current first-milestone limits:
- autonomous tasks are still limited to
planandcodeworkflows - publish-side effects such as push, merge, and release actions remain manual
- live task streaming currently tails persisted task events rather than using a dedicated in-memory pub/sub layer
See the design docs above for the full specification and current implementation notes.