For Subnet 100 Miners and General AI Agent Developers
This guide provides comprehensive best practices for building generalist AI agents that solve arbitrary tasks through reasoning rather than pattern matching. These principles apply to Subnet 100 (Terminal Bench) and any system requiring adaptive problem-solving.
| File | Topic | Description |
|---|---|---|
| 01-what-is-generalist-agent.md | Definitions | What makes an agent "generalist" vs "hardcoded" |
| 02-architecture-patterns.md | Architecture | Explore-Plan-Execute, ReAct, LLM loops |
| 03-allowed-vs-forbidden.md | Rules | What's allowed vs prohibited |
| 04-anti-patterns.md | Anti-Patterns | 30+ examples of what NOT to do |
| 05-best-practices.md | Best Practices | Good patterns with code examples |
| 06-llm-usage-guide.md | LLM Usage | Prompting, context management, function calling |
| 07-tool-design.md | Tool Design | How to design effective tools |
| 08-error-handling.md | Error Handling | Recovery strategies and retry logic |
| 09-testing-your-agent.md | Testing | Local testing before submission |
| 10-checklist.md | Checklist | Pre-submission validation checklist |
These are the fundamental principles that separate successful generalist agents from failing hardcoded ones:
WRONG: if "create file" in task: run_file_creation()
RIGHT: response = llm.chat([{"role": "user", "content": f"Task: {task}\nWhat commands should I run?"}])
The LLM must analyze each task independently. No shortcuts, no pattern matching on instructions.
Your agent should have zero knowledge of specific tasks. It should not contain:
- Keywords from task descriptions
- Task-specific handlers
- Conditional logic based on instruction content
Always gather context before executing:
import subprocess
def shell(cmd, cwd=None, timeout=60):
"""Execute shell command using subprocess."""
result = subprocess.run(cmd, shell=True, capture_output=True, text=True, cwd=cwd, timeout=timeout)
return result.stdout + result.stderr
# 1. EXPLORE
context = shell("pwd && ls -la")
readme = shell("cat README.md 2>/dev/null")
# 2. PLAN (via LLM)
messages = [{"role": "user", "content": f"Environment:\n{context}\n\nTask: {task}"}]
response = llm.chat(messages)
# 3. EXECUTE
for command in parse_commands(response.text):
shell(command)Before marking a task complete:
# Verify output file exists
result = shell(f"ls -la {output_path}")
if "No such file" in result:
# File doesn't exist - don't mark complete!
continue_working()
# Verify content is correct
content = shell(f"cat {output_path}")
# Let LLM verify if content meets requirementsEvery execution path must signal completion. The agent entry point receives --instruction and must complete:
import argparse
import subprocess
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--instruction", required=True)
args = parser.parse_args()
try:
# ... task execution ...
if task_successful:
print("[DONE] Task completed successfully")
else:
print("[DONE] Task failed")
except Exception as e:
print(f"[ERROR] {e}")
# Agent exits naturally - no special done() call needed
if __name__ == "__main__":
main()Warning: Agents that run indefinitely will timeout and receive zero score.
| Failure | Cause | Prevention |
|---|---|---|
| Infinite loop | No exit condition | Set max iterations, always have escape |
| Missing output file | Wrong path or not created | Use absolute paths, verify with ls -la |
| Constraint violation | Didn't read constraints | Parse constraints FIRST |
| Timeout | Task takes too long | Monitor time, prioritize critical steps |
| Context overflow | Too much history | Implement compaction (MANDATORY) |
- Read 01-what-is-generalist-agent.md to understand the core concepts
- Study 04-anti-patterns.md to learn what NOT to do
- Implement patterns from 05-best-practices.md
- Validate using 10-checklist.md before submission
flowchart LR
A[RECEIVE<br/>--instruction] --> B[EXPLORE<br/>CONTEXT]
B --> C[LLM<br/>REASON]
C --> D[EXECUTE<br/>subprocess]
D --> E{VERIFY}
E -->|Success| F[EXIT]
E -->|Iterate| B
A generalist agent treats every task as if it has never seen it before.
It uses the LLM to understand, plan, and execute - never hardcoded logic. The same agent code should work on ANY task, from "create a file" to "implement a path tracer" to "compile a Coq proof".
- Version: 2.0 (SDK 3.0 Compatible)
- Last Updated: January 2025
- Target: Subnet 100 / Terminal Bench v2.0