FREE-MAD: Consensus-Free Multi-Agent Debate

Production-ready Python implementation of the Free-MAD algorithm from the paper "Free-MAD: Consensus-Free Multi-Agent Debate".

freemad_ui.mp4

Project Status

FREE-MAD ships two runtimes:

debate: the original consensus-free answer-selection runtime
autonomous: a persistent task runtime for plan and code tasks with quorum review, resumable state, and task inspection surfaces

The autonomous runtime is intentionally a first milestone. It supports persisted tasks, staged execution, structured research provenance, policy-bound writes and local commands, human clarification feedback on resume, parallel execution for disjoint work items, CLI task commands, and background dashboard task execution with live event streaming.

Current Modes

debate mode: implemented and unchanged
autonomous mode: implemented first milestone for plan and code tasks

Design Docs

docs/autonomous-mode.md: contributor-facing overview of the shipped autonomous quorum runtime and its current limits
docs/plans/2026-03-31-autonomous-quorum-runtime-spec.md: detailed runtime specification, updated to the implemented first milestone
docs/plans/2026-03-31-autonomous-quorum-runtime-implementation-plan.md: the test-first rollout plan used to land the initial implementation

What is Free-MAD?

Free-MAD is a revolutionary approach to multi-agent AI systems that eliminates the need for consensus among agents while achieving better accuracy and efficiency than traditional debate methods.

The Problem with Traditional Multi-Agent Debates

When you have multiple AI agents working on the same problem, traditional approaches (MAD - Multi-Agent Debate) work like this:

Agents debate until they agree (reach consensus)
The final answer is chosen by majority vote

This has serious problems:

Conformity bias: Agents with the right answer get pressured by the majority into changing their minds (like peer pressure)
High cost: Multiple debate rounds are needed to reach agreement
Majority tyranny: The right answer can lose if fewer agents picked it—truth doesn't always win by popularity

How Free-MAD Solves This

Free-MAD takes a fundamentally different approach:

No consensus required - Agents can disagree throughout the entire debate
Score the journey, not just the destination - Instead of only looking at final votes, Free-MAD evaluates the quality of reasoning across ALL debate rounds
Quality beats quantity - A single agent with strong reasoning can win, even if all others disagree

Think of it like judges scoring a debate competition: they don't wait to see who "wins" by convincing everyone else. Instead, they score the quality of each debater's arguments throughout the entire debate. The best-argued position wins, regardless of whether it convinced the majority.

How It Works

The Algorithm:

Round 0 (Generation): All agents independently propose solutions
Round 1+ (Critique): Agents debate in two modes:
- Conformity mode: Present arguments supporting their answer
- Anti-conformity mode: Find flaws in other agents' answers
Scoring: Track the entire debate trajectory and score based on:
- Quality of arguments
- Valid criticisms found
- How positions evolved over time
Decision: Select the answer with the highest score (not the most votes)

Example:

Round 1:
  Agent 1: Answer A (with strong reasoning)
  Agent 2: Answer B
  Agent 3: Answer B

Round 2:
  Agent 1: Stays with A, points out flaws in B
  Agent 2: Switches to A (convinced by Agent 1's arguments)
  Agent 3: Stays with B

Traditional MAD: B wins (2 votes)
Free-MAD: A wins (higher score due to quality of reasoning)

This means a single agent with the right answer and strong reasoning can win, even if the majority disagrees—something impossible with traditional consensus-based approaches.

Quick Start

This section covers both shipped runtimes.

Installation

# With Poetry (recommended)
poetry install
poetry run freemad --version

# With pip
pip install -e .
freemad --version

Run Your First Multi-Agent Debate

# Using YAML configuration
poetry run freemad "Write a function that returns Fibonacci(n)." \
  --rounds 2 \
  --config config_examples/multi_agent.yaml

# Using JSON configuration
poetry run freemad "Write a function that returns Fibonacci(n)." \
  --rounds 2 \
  --config config_examples/multi_agent.json

Both YAML and JSON formats are supported. See config_examples/multi_agent.yaml or config_examples/multi_agent.json for complete configuration examples.

Run Your First Autonomous Task

Autonomous mode uses a persistent task store and role-aware agents. The minimal entry point is:

poetry run python -m freemad.cli task start \
  --config path/to/autonomous-config.yaml \
  --task-type plan \
  --workspace-root "$PWD" \
  "Critique this architecture until the agents approve an implementation-ready plan."

Useful follow-up commands:

poetry run python -m freemad.cli task status <task_id> --config path/to/autonomous-config.yaml
poetry run python -m freemad.cli task inspect <task_id> --config path/to/autonomous-config.yaml
poetry run python -m freemad.cli task resume <task_id> --config path/to/autonomous-config.yaml
poetry run python -m freemad.cli task answer <task_id> "Use SQLite." --config path/to/autonomous-config.yaml
poetry run python -m freemad.cli task approve <task_id> plan_review --config path/to/autonomous-config.yaml
poetry run python -m freemad.cli task pause <task_id> --config path/to/autonomous-config.yaml

The first milestone currently supports:

plan tasks that research, draft, review, arbitrate, and finalize plans
code tasks that execute work items, run code review, run verification, and finalize

See docs/autonomous-mode.md for role requirements, persistence layout, dashboard routes, and current limitations.

Configuration

Free-MAD is configured via YAML or JSON files. Here's a minimal example:

agents:
  - id: claude-sonnet
    type: claude_code
    cli_command: "claude"
    cli_args: {model: "sonnet"}
    timeout: 600

  - id: gpt-5
    type: openai_codex
    cli_command: "codex exec"
    cli_args: {--model: "gpt-5.1"}
    cli_flags: ["--skip-git-repo-check"]
    cli_positional: ["-"]
    timeout: 600

topology:
  type: all_to_all    # all agents review all others
  seed: 427           # deterministic peer assignment

deadlines:
  soft_timeout_ms: 15000   # quorum wait
  hard_timeout_ms: 30000   # hard stop
  min_agents: 2            # quorum size

scoring:
  weights: [20.0, 25.0, 30.0, 20.0]  # [initial, change-penalty, change-bonus, keep]
  normalize: true                     # contributor-based normalization
  tie_break: deterministic            # or 'random'

security:
  cli_allowed_commands: ["claude", "codex"]
  cli_use_shell: false
  max_requirement_size: 20000
  max_solution_size: 400000

output:
  save_transcript: true
  transcript_dir: transcripts
  format: json

Complete configuration examples:

YAML: config_examples/multi_agent.yaml
JSON: config_examples/multi_agent.json
All available options: config_examples/ALL_KEYS.yaml

Configuration Reference

Agents

Define the AI agents participating in the debate:

id: Unique identifier
type: Adapter type (claude_code, openai_codex)
cli_command: Command to invoke the agent
cli_args: Key-value arguments passed to the CLI
cli_flags: Boolean flags (e.g., ["--verbose"])
cli_positional: Positional arguments (e.g., ["-"] for stdin)
timeout: Per-call timeout in seconds
config.temperature: Model temperature (0.0-1.0)
config.max_tokens: Max output tokens (null = unlimited)
roles: Optional autonomous-task roles such as researcher, planner, reviewer, implementer, verifier, arbiter
capabilities: Optional autonomous action kinds such as research, plan, review, implement, verify

Topology

Control how agents review each other's work:

all_to_all: Every agent reviews all others (full debate)
k_reviewers: Each agent reviews k random peers
ring: Agents review in a circular pattern
star: All agents review a central hub agent

Scoring

Configure the Free-MAD scoring algorithm:

weights: [initial, change_penalty, change_bonus, keep] - Weights for different scoring components
normalize: Divide by contributor count to prevent score inflation
tie_break: deterministic (first in list) or random
random_seed: Seed for random tie-breaking

Deadlines

Control debate round timing:

soft_timeout_ms: Wait for quorum before proceeding
hard_timeout_ms: Absolute deadline (accept late arrivals until this)
min_agents: Quorum size at soft deadline

Security

cli_allowed_commands: Whitelist of allowed executables
cli_use_shell: Must be false for security
max_requirement_size: Input size cap (chars)
max_solution_size: Output size cap (chars)
redact_patterns: Regex patterns to redact from logs

Budget

max_total_time_sec: Overall wall time budget
max_round_time_sec: Per-round budget
max_agent_time_sec: Per-agent call budget
max_tokens_per_agent_per_round: Prompt truncation cap
enable_token_truncation: Allow prompt truncation
max_concurrent_agents: Parallelism limit

Output

save_transcript: Persist debate transcript
transcript_dir: Output directory
format: json or markdown
verbose: Print extra info during execution

Validation

enable_sandbox: Run solutions in restricted Python sandbox
sandbox_timeout_ms: Sandbox execution limit

Cache

enabled: On-disk memoization of agent outputs
dir: Cache directory
max_entries: Eviction limit

Autonomous Tasks

task.store_path: SQLite database path for task metadata and events
task.artifacts_dir: Directory for task-scoped artifacts
task.max_stage_retries: Retry count before arbitration or pause
task.max_total_iterations: Overall iteration cap for a task
task.tool_policy.allow_web_research: Whether autonomous tasks may rely on agent-native research tools
task.tool_policy.allow_workspace_write: Whether autonomous tasks may write to the workspace
task.tool_policy.allowed_write_roots: Relative roots autonomous writes may touch
task.tool_policy.allow_local_commands: Whether autonomous tasks may run local commands
task.tool_policy.allowed_local_commands: Allowlist for task-run commands
task.tool_policy.verification_commands: Extra commands run during the verification stage

Agent CLI Contract

Free-MAD communicates with agents via stdin/stdout. Your agent CLI must:

Accept mode as argument: <cli_command> generate or <cli_command> critique
Read prompt from stdin: The debate requirement or critique instructions
Output structured response:

SOLUTION:
<your proposed solution>

REASONING:
<your reasoning/arguments>

Example Agent Wrapper

If your agent doesn't follow this contract, wrap it:

#!/usr/bin/env python3
import sys
import subprocess

mode = sys.argv[1]  # 'generate' or 'critique'
prompt = sys.stdin.read()

# Call your actual agent
result = subprocess.run(
    ["your-agent-command", "--mode", mode],
    input=prompt,
    capture_output=True,
    text=True
)

# Format output
print(f"SOLUTION:\n{result.stdout}")
print(f"\nREASONING:\nGenerated in {mode} mode")

Development

Running Tests

# Install dev dependencies
poetry install --with dev

# Run tests
poetry run pytest -q

# With coverage
poetry run pytest --cov=freemad --cov-report=term --cov-report=xml

Type Checking

mypy .

Pre-commit Hooks

poetry run pre-commit install
poetry run pre-commit run --all-files

Code Conventions

See AGENTS.md for detailed conventions:

Immutable dataclasses
StrEnums for constants
No hard-coded strings internally
Serialization at boundaries only

Transcripts

Debate transcripts capture the complete history for analysis:

{
  "final_answer_id": "abc123...",
  "final_solution": "def fibonacci(n): ...",
  "scores": {
    "abc123...": 85.5,
    "def456...": 72.3
  },
  "winning_agents": ["claude-sonnet"],
  "transcript": [
    {
      "round": 0,
      "type": "generation",
      "agents": {
        "claude-sonnet": {
          "response": { "solution": "...", "reasoning": "..." },
          "peers_assigned": [],
          "peers_seen": []
        }
      }
    },
    {
      "round": 1,
      "type": "critique",
      "agents": { ... }
    }
  ]
}

Find transcripts in transcripts/ by default when output.save_transcript: true.

Dashboard (WIP)

Free-MAD includes a web-based dashboard to visualize debate results. The dashboard reads JSON transcripts and displays the final answer, winning agents, and scores.

Running the Dashboard

poetry run freemad-dashboard --dir transcripts --host 127.0.0.1 --port 8001

Then open your browser to http://127.0.0.1:8001 to view the results.

Command Options:

--dir: Directory containing JSON transcripts (default: transcripts)
--host: Server host address (default: 127.0.0.1)
--port: Server port (default: 8001)

Current Features

✅ View final debate results
✅ See winning agents and scores
✅ Browse all transcript files

Future Roadmap

The dashboard is actively being developed. Planned features include:

Autonomous Task Views:

Persistent task pages distinct from debate transcripts
Stage timeline for research, planning, execution, review, and verification
Open-questions pane for human clarification and approvals
Artifact browser for plans, patches, review notes, and verification logs

Real-Time Debate Visualization:

Live conversation view showing agent-to-agent interactions
Visual timeline of debate rounds
See who said what in each round

Metrics & Analytics:

Token usage tracking per agent and per round
Time/duration metrics for each debate phase
Cost estimation based on model pricing

Agent Information:

Display model configurations (temperature, max_tokens)
Show agent types and CLI commands used
Topology visualization (peer assignment graphs)

Configuration UI:

Configure agents through the web interface
Edit debate parameters (rounds, weights, timeouts)
Save and load configuration presets

Interactive Final Agent:

Chat with a final orchestrator agent
Execute the winning solution interactively
Provide feedback and iterate on results

Enhanced UX:

Make the system more user-friendly vs. command-line only
Drag-and-drop configuration builder
Real-time progress indicators

Contributions Welcome! If you'd like to help build these features, please see CONTRIBUTING.md or open an issue to discuss implementation ideas.

Troubleshooting

Agents not responding

Verify cli_command is in your PATH
Check cli_command is in security.cli_allowed_commands
Increase agents[].timeout if needed
Enable debug logging: logging.level: DEBUG

Empty final solution

Agents must output exactly SOLUTION: and REASONING: markers
Check transcript to see what agents actually produced
Test your agent CLI manually with echo prompts

Debate ends early

Increase deadlines.hard_timeout_ms
Increase budget.max_round_time_sec
Ensure deadlines.min_agents ≤ number of enabled agents
Check early_stop_reason in transcript

Deterministic results

Set topology.seed for consistent peer assignments
Set scoring.random_seed for consistent tie-breaking
Use scoring.tie_break: deterministic

Community & Support

Issues: GitHub Issues
Contributing: See CONTRIBUTING.md
Code of Conduct: See CODE_OF_CONDUCT.md
Security: See SECURITY.md for private vulnerability reporting
Governance: See GOVERNANCE.md

Citation

If you use this implementation in your research, please cite:

@software{freemad2025,
  author = {Santilli, Jonathan},
  title = {FREE-MAD: Consensus-Free Multi-Agent Debate Implementation},
  year = {2025},
  url = {https://github.com/jonathansantilli/mad}
}

And the original paper:

@article{freemad2024,
  title={Free-MAD: Consensus-Free Multi-Agent Debate},
  author={...},
  journal={arXiv preprint arXiv:2509.11035},
  year={2024}
}

License

Trademarks & Affiliations

This project is independent and not affiliated with Anthropic, OpenAI, or any other vendor. "Claude", "Codex", and any other product names are trademarks of their respective owners and are used here only for identification.

Research Paper

This implementation is based on the paper:

"Free-MAD: Consensus-Free Multi-Agent Debate" arXiv:2509.11035v1 https://arxiv.org/html/2509.11035v1

Key Contributions from the Paper:

Eliminates consensus requirement: Agents can disagree throughout the debate
Score-based decision mechanism: Evaluates entire debate trajectory, not just final votes
Improved accuracy: Outperforms traditional MAD on reasoning benchmarks
Better efficiency: Requires fewer debate rounds than consensus-based approaches
Robustness: Resistant to conformity bias and communication attacks

Autonomous Runtime

The repository now ships the first autonomous milestone beside the original debate runtime.

Current autonomous properties:

no single agent can declare a task complete by itself
every stage requires at least a proposer and an independent checker
disagreement triggers revision, arbitration, or human escalation instead of being hidden
long-lived tasks can resume after restart with persisted state and artifacts
research, planning, code changes, review, and verification are all first-class stages

Current supported workflows include:

"Critique this architecture until the agents agree it is implementation-ready."
"Implement this approved plan, have another agent review it, and only finish once verification passes."
"Ask me a concrete question only when the goal is ambiguous or the next action is risky."

Current first-milestone limits:

autonomous tasks are still limited to plan and code workflows
publish-side effects such as push, merge, and release actions remain manual
live task streaming currently tails persisted task events rather than using a dedicated in-memory pub/sub layer

See the design docs above for the full specification and current implementation notes.

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
.github		.github
bin		bin
config_examples		config_examples
docs		docs
freemad		freemad
freemad_dashboard_ui		freemad_dashboard_ui
tests		tests
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DCO		DCO
GOVERNANCE.md		GOVERNANCE.md
LICENSE		LICENSE
Makefile		Makefile
OPEN_SOURCE_READINESS.md		OPEN_SOURCE_READINESS.md
PRD.md		PRD.md
README.md		README.md
RELEASE.md		RELEASE.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
mypy.ini		mypy.ini
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

FREE-MAD: Consensus-Free Multi-Agent Debate

Project Status

Current Modes

Design Docs

What is Free-MAD?

The Problem with Traditional Multi-Agent Debates

How Free-MAD Solves This

How It Works

Quick Start

Installation

Run Your First Multi-Agent Debate

Run Your First Autonomous Task

Configuration

Configuration Reference

Agents

Topology

Scoring

Deadlines

Security

Budget

Output

Validation

Cache

Autonomous Tasks

Agent CLI Contract

Example Agent Wrapper

Development

Running Tests

Type Checking

Pre-commit Hooks

Code Conventions

Transcripts

Dashboard (WIP)

Running the Dashboard

Current Features

Future Roadmap

Troubleshooting

Agents not responding

Empty final solution

Debate ends early

Deterministic results

Community & Support

Citation

License

Trademarks & Affiliations

Research Paper

Key Contributions from the Paper:

Autonomous Runtime

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages