Intelligent routing and orchestration for distributed AI resources
Oxide is a comprehensive platform for managing and orchestrating multiple Large Language Model (LLM) services. It intelligently routes tasks to the most appropriate LLM based on task characteristics, provides a web dashboard for monitoring and management, and integrates seamlessly with Claude Code via Model Context Protocol (MCP).
- Automatic Service Selection: Analyzes task type, complexity, and file count to choose the optimal LLM
- Custom Routing Rules: Configure permanent task-to-service assignments via Web UI
- Fallback Support: Automatic failover to alternative services if primary is unavailable
- Parallel Execution: Distribute large codebase analysis across multiple LLMs
- Manual Override: Select specific services for individual tasks
- Auto-Start Ollama: Automatically starts Ollama if not running (macOS, Linux, Windows)
- Auto-Detect Models: Discovers available models without manual configuration
- Smart Model Selection: Chooses best model based on preferences and availability
- Auto-Recovery: Retries with service restart on temporary failures
- Zero-Config LM Studio: Works with LM Studio without model name configuration
- Real-time Monitoring: Live metrics for CPU, memory, task execution, and service health
- Task Executor: Execute tasks directly from the browser with service selection
- Task Assignment Manager: Configure which LLM handles specific task types
- Task History: Complete history of all executed tasks with results and metrics
- WebSocket Support: Real-time updates for task progress and system events
- Service Management: Monitor and test all configured LLM services
- Claude Code Integration: Use Oxide directly within Claude Code
- Three MCP Tools:
route_task- Execute tasks with intelligent routinganalyze_parallel- Parallel codebase analysislist_services- Check service health and availability
- Persistent Task Storage: All tasks saved to
~/.oxide/tasks.json - Auto-start Web UI: Optional automatic Web UI launch with MCP server
- Automatic Cleanup: All spawned processes (Web UI, Gemini, Qwen, etc.) cleaned up on exit
- Signal Handlers: Graceful shutdown on SIGTERM/SIGINT
- Process Registry: Tracks all child processes for guaranteed cleanup
- No Orphaned Processes: Ensures clean system state even on forced termination
- Google Gemini (CLI) - 2M+ token context window, ideal for large codebases
- Qwen (CLI) - Optimized for code generation and review
- Ollama (HTTP) - Local and remote instances
- Extensible: Easy to add new LLM adapters
- Python 3.11+
- uv package manager
- Node.js 18+ (for Web UI)
- Gemini CLI (optional)
- Qwen CLI (optional)
- Ollama (optional)
# Clone the repository
cd /Users/yayoboy/Documents/GitHub/oxide
# Install dependencies
uv sync
# Build the Web UI
cd src/oxide/web/frontend
npm install
npm run build
cd ../../..
# Verify installation
uv run oxide-mcp --helpEdit config/default.yaml:
services:
gemini:
enabled: true
type: cli
executable: gemini
qwen:
enabled: true
type: cli
executable: qwen
ollama_local:
enabled: true
type: http
base_url: http://localhost:11434
model: qwen2.5-coder:7b
default_model: qwen2.5-coder:7b
ollama_remote:
enabled: false
type: http
base_url: http://192.168.1.46:11434
model: qwen2.5-coder:7b
routing_rules:
prefer_local: true
fallback_enabled: true
execution:
timeout_seconds: 120
max_retries: 2
retry_on_failure: true
max_parallel_workers: 3
logging:
level: INFO
console: true
file: oxide.log- Configure Claude Code
Add to ~/.claude/settings.json:
{
"mcpServers": {
"oxide": {
"command": "uv",
"args": ["--directory", "/Users/yayoboy/Documents/GitHub/oxide", "run", "oxide-mcp"],
"env": {
"OXIDE_AUTO_START_WEB": "true"
}
}
}
}Setting OXIDE_AUTO_START_WEB=true automatically starts the Web UI at http://localhost:8000
- Use in Claude Code
Claude will automatically use Oxide MCP tools:
You: "Analyze this codebase for architecture patterns"
Claude: Uses Oxide to route to Gemini (large context)
You: "Review this function for bugs"
Claude: Uses Oxide to route to Qwen (code specialist)
You: "What is 2+2?"
Claude: Uses Oxide to route to Ollama Local (quick query)
- Start the Web UI
# Option A: Use the startup script
./scripts/start_web_ui.sh
# Option B: Manual start
python -m uvicorn oxide.web.backend.main:app --host 0.0.0.0 --port 8000
# Option C: Auto-start with MCP (set OXIDE_AUTO_START_WEB=true)
uv run oxide-mcp- Access the Dashboard
Open http://localhost:8000 in your browser
from oxide.core.orchestrator import Orchestrator
from oxide.config.loader import load_config
# Initialize
config = load_config()
orchestrator = Orchestrator(config)
# Execute a task with intelligent routing
async for chunk in orchestrator.execute_task(
prompt="Explain quantum computing",
files=None,
preferences=None # Let Oxide choose
):
print(chunk, end="")
# Execute with manual service selection
async for chunk in orchestrator.execute_task(
prompt="Review this code",
files=["src/main.py"],
preferences={"preferred_service": "qwen"}
):
print(chunk, end="")ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Oxide Orchestrator β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Classifier ββββΆβ Router ββββΆβ Adapters β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β β β β
β β β β β
β Task Analysis Route Decision LLM Execution β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Process Manager - Lifecycle Management β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Task Storage - Persistent History β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Routing Rules - Custom Assignments β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββ βββββββββββββ ββββββββββββ
β MCP β β Web UI β β Python β
β Server β β Backend β β API β
βββββββββββββ βββββββββββββ ββββββββββββ
Analyzes tasks to determine:
- Task type (coding, review, codebase_analysis, etc.)
- Complexity score based on keywords and patterns
- File count and total size
- Whether parallel execution is beneficial
Task Types:
coding- Code generationcode_review- Code reviewbug_search- Bug analysisrefactoring- Code refactoringdocumentation- Writing docscodebase_analysis- Large codebase analysisquick_query- Simple questionsgeneral- General purpose
Routes tasks based on:
- Task classification results
- Custom routing rules (user-defined permanent assignments)
- Service health status and availability
- Fallback preferences and retry logic
Unified interface for different LLM types:
-
CLI Adapters (
cli_adapter.py):- Gemini (
gemini.py) - Subprocess execution, 2M+ context - Qwen (
qwen.py) - Code specialist - Automatic process tracking and cleanup
- Gemini (
-
HTTP Adapters (
ollama_http.py):- Ollama Local/Remote - REST API communication
- Streaming support
- Health checks
All adapters implement:
execute()- Task execution with streaminghealth_check()- Service availability checkget_service_info()- Service metadata
Persistent task history management:
- Storage:
~/.oxide/tasks.json - Thread-safe: Concurrent read/write support
- Tracked data:
- Task ID, status, timestamps
- Prompt, files, preferences
- Service used, task type
- Result, error, duration
- Features:
- List/filter tasks by status
- Get statistics (by service, by type, by status)
- Clear tasks (all or by status)
Lifecycle management for all spawned processes:
- Tracks: Web UI server, CLI processes (Gemini, Qwen)
- Signal handlers: SIGTERM, SIGINT, SIGHUP
- Cleanup: Automatic on exit (graceful β force kill)
- Safety: Prevents orphaned processes
- atexit hook: Final cleanup guarantee
User-defined task-to-service assignments:
- Storage:
~/.oxide/routing_rules.json - Format:
{"task_type": "service_name"} - Example:
{ "coding": "qwen", "code_review": "gemini", "bug_search": "qwen", "quick_query": "ollama_local" } - Priority: Custom rules override intelligent routing
- Services: Total, enabled, healthy, unhealthy
- Tasks: Running, completed, failed, queued
- System: CPU %, Memory % and usage
- WebSocket: Active connections
- Auto-refresh every 2 seconds
Execute tasks directly from the browser:
- Prompt input: Multi-line text area
- Service selection:
- π€ Auto (Intelligent Routing) - Let Oxide choose
- Manual - Select specific service (gemini, qwen, ollama, etc.)
- Real-time streaming: See results as they appear
- Error handling: Clear error messages
- Integration: Tasks appear immediately in history
Service cards showing:
- Status: β
Healthy /
β οΈ Unavailable / β Disabled - Type: CLI or HTTP
- Description: Service capabilities
- Details: Base URL (HTTP), executable (CLI)
- Context: Max tokens (Gemini: 2M+)
Configure permanent task-to-service assignments:
Interface:
- Add Rule Form:
- Dropdown: Select task type (coding, review, etc.)
- Dropdown: Select service (qwen, gemini, ollama)
- Button: Add Rule
- Active Rules Table:
- Task Type | Assigned Service | Description | Actions
- Delete individual rules
- Clear all rules
Available Task Types:
- coding β Code Generation β Recommended: qwen, gemini
- code_review β Code Review β Recommended: qwen, gemini
- bug_search β Bug Search β Recommended: qwen, gemini
- refactoring β Code Refactoring β Recommended: qwen, gemini
- documentation β Documentation β Recommended: gemini, qwen
- codebase_analysis β Large Codebase β Recommended: gemini
- quick_query β Simple Questions β Recommended: ollama_local
- general β General Purpose β Recommended: ollama_local, qwen
Example Configuration:
coding β qwen (All code generation to qwen)
code_review β gemini (All reviews to gemini)
bug_search β qwen (Bug analysis to qwen)
quick_query β ollama (Fast queries to local ollama)
When a task matches a rule, it's always routed to the assigned service, bypassing intelligent routing.
Complete history of all executed tasks:
- From all sources: MCP, Web UI, Python API
- Auto-refresh: Every 3 seconds
- Display:
- Status badge (completed, running, failed, queued)
- Timestamp, duration
- Prompt preview (first 150 chars)
- Service used, task type
- File count
- Error messages (if failed)
- Result preview (first 200 chars)
- Limit: Latest 10 tasks by default
WebSocket event stream:
- Real-time task progress
- Service status changes
- System events
Base URL: http://localhost:8000/api
Execute Task
POST /api/tasks/execute
Content-Type: application/json
{
"prompt": "Your query here",
"files": ["path/to/file.py"],
"preferences": {
"preferred_service": "qwen"
}
}
Response: {"task_id": "...", "status": "queued", "message": "..."}List Tasks
GET /api/tasks/?limit=10&status=completed
Response: {
"tasks": [...],
"total": 42,
"filtered": 10
}Get Task
GET /api/tasks/{task_id}
Response: {
"id": "...",
"status": "completed",
"prompt": "...",
"result": "...",
"duration": 5.23,
...
}Delete Task
DELETE /api/tasks/{task_id}Clear Tasks
POST /api/tasks/clear?status=completedList Services
GET /api/services/
Response: {
"services": {
"gemini": {"enabled": true, "healthy": true, ...},
...
},
"total": 4,
"enabled": 3
}Get Service
GET /api/services/{service_name}Health Check
POST /api/services/{service_name}/healthTest Service
POST /api/services/{service_name}/test?test_prompt=HelloList All Rules
GET /api/routing/rules
Response: {
"rules": [
{"task_type": "coding", "service": "qwen"},
...
],
"stats": {
"total_rules": 3,
"rules_by_service": {"qwen": 2, "gemini": 1},
"task_types": ["coding", "code_review", "bug_search"]
}
}Get Rule
GET /api/routing/rules/{task_type}Create/Update Rule
POST /api/routing/rules
Content-Type: application/json
{
"task_type": "coding",
"service": "qwen"
}
Response: {
"message": "Routing rule updated",
"rule": {"task_type": "coding", "service": "qwen"}
}Update Rule
PUT /api/routing/rules/{task_type}
Content-Type: application/json
{
"task_type": "coding",
"service": "gemini"
}Delete Rule
DELETE /api/routing/rules/{task_type}Clear All Rules
POST /api/routing/rules/clearGet Available Task Types
GET /api/routing/task-types
Response: {
"task_types": [
{
"name": "coding",
"label": "Code Generation",
"description": "Writing new code, implementing features",
"recommended_services": ["qwen", "gemini"]
},
...
]
}Get Metrics
GET /api/monitoring/metrics
Response: {
"services": {"total": 4, "enabled": 3, "healthy": 2, ...},
"tasks": {"total": 10, "running": 0, "completed": 8, ...},
"system": {"cpu_percent": 25.3, "memory_percent": 45.7, ...},
"websocket": {"connections": 1},
"timestamp": 1234567890.123
}Get Stats
GET /api/monitoring/stats
Response: {
"total_tasks": 42,
"avg_duration": 5.67,
"success_rate": 95.24,
"tasks_by_status": {"completed": 40, "failed": 2}
}Health Check
GET /api/monitoring/health
Response: {
"status": "healthy",
"healthy": true,
"issues": [],
"cpu_percent": 25.3,
"memory_percent": 45.7
}Connect to ws://localhost:8000/ws for real-time updates.
Message Types:
- task_start
{
"type": "task_start",
"task_id": "...",
"task_type": "coding",
"service": "qwen"
}- task_progress (streaming)
{
"type": "task_progress",
"task_id": "...",
"chunk": "Here is the code..."
}- task_complete
{
"type": "task_complete",
"task_id": "...",
"success": true,
"duration": 5.23
}Client Usage:
const ws = new WebSocket('ws://localhost:8000/ws');
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'task_progress') {
console.log(data.chunk);
}
};
// Keep-alive ping
setInterval(() => ws.send('ping'), 30000);oxide/
βββ config/
β βββ default.yaml # Main configuration
βββ src/oxide/
β βββ core/
β β βββ classifier.py # Task classification
β β βββ router.py # Routing logic
β β βββ orchestrator.py # Main orchestrator
β βββ adapters/
β β βββ base.py # Base adapter interface
β β βββ cli_adapter.py # CLI adapter base
β β βββ gemini.py # Gemini adapter
β β βββ qwen.py # Qwen adapter
β β βββ ollama_http.py # Ollama HTTP adapter
β βββ execution/
β β βββ parallel.py # Parallel execution engine
β βββ utils/
β β βββ task_storage.py # Task persistence
β β βββ routing_rules.py # Routing rules storage
β β βββ process_manager.py # Process lifecycle
β β βββ logging.py # Logging utilities
β β βββ exceptions.py # Custom exceptions
β βββ mcp/
β β βββ server.py # MCP server (FastMCP)
β β βββ tools.py # MCP tool definitions
β βββ web/
β βββ backend/
β β βββ main.py # FastAPI application
β β βββ websocket.py # WebSocket manager
β β βββ routes/
β β βββ tasks.py # Task endpoints
β β βββ services.py # Service endpoints
β β βββ routing.py # Routing rules endpoints
β β βββ monitoring.py # Monitoring endpoints
β βββ frontend/ # React SPA
β βββ src/
β β βββ components/
β β β βββ TaskExecutor.jsx
β β β βββ TaskAssignmentManager.jsx
β β β βββ TaskHistory.jsx
β β β βββ ServiceCard.jsx
β β β βββ MetricsDashboard.jsx
β β βββ hooks/
β β β βββ useServices.js
β β β βββ useMetrics.js
β β β βββ useWebSocket.js
β β βββ api/
β β β βββ client.js
β β βββ App.jsx
β βββ package.json
β βββ vite.config.js
βββ tests/
β βββ test_process_cleanup.py
β βββ test_task_history_integration.py
βββ scripts/
βββ start_web_ui.sh
# Process cleanup tests
python3 tests/test_process_cleanup.py
# Task history integration tests
python3 tests/test_task_history_integration.py
# All tests pass
# β Sync process cleanup
# β Async process cleanup
# β Multiple process cleanup
# β Signal handler cleanup
# β Task storage integration- Create adapter class
# src/oxide/adapters/my_llm.py
from .base import BaseAdapter
from typing import AsyncIterator, List, Optional
class MyLLMAdapter(BaseAdapter):
def __init__(self, config: dict):
super().__init__("my_llm", config)
self.api_key = config.get("api_key")
# Initialize your client...
async def execute(
self,
prompt: str,
files: Optional[List[str]] = None,
timeout: Optional[int] = None,
**kwargs
) -> AsyncIterator[str]:
"""Execute task and stream results."""
# Your implementation
yield "Response chunk"
async def health_check(self) -> bool:
"""Check if service is available."""
# Your health check logic
return True
def get_service_info(self) -> dict:
"""Return service metadata."""
info = super().get_service_info()
info.update({
"description": "My LLM Service",
"max_tokens": 100000
})
return info- Register in configuration
# config/default.yaml
services:
my_llm:
enabled: true
type: http # or 'cli'
base_url: http://localhost:8080
model: my-model
api_key: ${MY_LLM_API_KEY} # From environment- Update orchestrator
# src/oxide/core/orchestrator.py
def _create_adapter(self, service_name, config):
service_type = config.get("type")
if service_type == "cli":
if "my_llm" in service_name:
from ..adapters.my_llm import MyLLMAdapter
return MyLLMAdapter(config)
# ... other CLI adapters
elif service_type == "http":
if "my_llm" in service_name:
from ..adapters.my_llm import MyLLMAdapter
return MyLLMAdapter(config)
# ... other HTTP adapters- Test your adapter
import asyncio
from oxide.core.orchestrator import Orchestrator
from oxide.config.loader import load_config
async def test():
config = load_config()
orchestrator = Orchestrator(config)
async for chunk in orchestrator.execute_task(
prompt="Test query",
preferences={"preferred_service": "my_llm"}
):
print(chunk, end="")
asyncio.run(test())Oxide creates the following files in ~/.oxide/:
- tasks.json - Task execution history (all tasks from all sources)
- routing_rules.json - Custom routing rules (task type β service)
- oxide.log - Application logs (if file logging enabled)
Example tasks.json:
{
"task-uuid-1": {
"id": "task-uuid-1",
"status": "completed",
"prompt": "What is quantum computing?",
"files": [],
"service": "ollama_local",
"task_type": "quick_query",
"result": "Quantum computing is...",
"error": null,
"created_at": 1234567890.123,
"started_at": 1234567890.456,
"completed_at": 1234567895.789,
"duration": 5.333
}
}Example routing_rules.json:
{
"coding": "qwen",
"code_review": "gemini",
"bug_search": "qwen",
"quick_query": "ollama_local"
}Oxide can automatically start Ollama if it's not running:
# config/default.yaml
services:
ollama_local:
type: http
base_url: "http://localhost:11434"
api_type: ollama
enabled: true
auto_start: true # π₯ Auto-start if not running
auto_detect_model: true # π₯ Auto-detect best model
max_retries: 2 # Retry on failures
retry_delay: 2 # Seconds between retriesWhat happens:
- First task execution checks if Ollama is running
- If not, automatically starts Ollama via:
- macOS: Opens Ollama.app or runs
ollama serve - Linux: Uses systemd or runs
ollama serve - Windows: Runs
ollama serveas detached process
- macOS: Opens Ollama.app or runs
- Waits up to 30s for Ollama to be ready
- Proceeds with task execution
No need to configure model names manually:
lmstudio:
type: http
base_url: "http://192.168.1.33:1234/v1"
api_type: openai_compatible
enabled: true
default_model: null # π₯ Will auto-detect
auto_detect_model: true
preferred_models: # Priority order
- "qwen" # Matches: qwen/qwen2.5-coder-14b
- "coder" # Matches: mistralai/codestral-22b
- "deepseek" # Matches: deepseek/deepseek-r1Smart Selection Algorithm:
- Fetches available models from service
- Tries exact match with preferred models
- Tries partial match (e.g., "qwen" matches "qwen2.5-coder:7b")
- Falls back to first available model
from oxide.utils.service_manager import get_service_manager
service_manager = get_service_manager()
# Comprehensive health check with auto-recovery
health = await service_manager.ensure_service_healthy(
service_name="ollama_local",
base_url="http://localhost:11434",
api_type="ollama",
auto_start=True, # Try to start if down
auto_detect_model=True # Detect available models
)
print(f"Healthy: {health['healthy']}")
print(f"Models: {health['models']}")
print(f"Recommended: {health['recommended_model']}")# Start monitoring (checks every 60s, auto-recovers on failure)
await service_manager.start_health_monitoring(
service_name="ollama_local",
base_url="http://localhost:11434",
interval=60,
auto_recovery=True
)# Ollama will auto-start if not running!
async for chunk in orchestrator.execute_task("What is 2+2?"):
print(chunk, end="")
# What happens:
# 1. Checks if Ollama is running β not running
# 2. Auto-starts Ollama (takes ~5s)
# 3. Auto-detects model: qwen2.5-coder:7b
# 4. Executes task
# 5. Returns: "4"async for chunk in orchestrator.execute_task(
prompt="Review this code for bugs",
files=["src/auth.py"],
preferences={"preferred_service": "gemini"}
):
print(chunk, end="")
# Forces routing to: gemini
# Gets large context window for thorough review# Parallel analysis
from oxide.execution.parallel import ParallelExecutor
executor = ParallelExecutor(max_workers=3)
result = await executor.execute_parallel(
prompt="Analyze architecture patterns",
files=["src/**/*.py"], # 50+ files
services=["gemini", "qwen", "ollama_local"],
strategy="split"
)
print(f"Completed in {result.total_duration_seconds}s")
print(result.aggregated_text)# Set up rules via API
import requests
requests.post("http://localhost:8000/api/routing/rules", json={
"task_type": "coding",
"service": "qwen"
})
# Now all coding tasks go to qwen automatically
async for chunk in orchestrator.execute_task("Write a Python function to sort a list"):
print(chunk, end="")
# Routes to: qwen (custom rule)Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Add tests if applicable
- Update documentation
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
# Clone your fork
git clone https://github.com/yourusername/oxide.git
cd oxide
# Install dev dependencies
uv sync
# Install frontend dependencies
cd src/oxide/web/frontend
npm install
cd ../../..
# Run tests
python3 tests/test_process_cleanup.py
python3 tests/test_task_history_integration.py
# Start development servers
python -m uvicorn oxide.web.backend.main:app --reload &
cd src/oxide/web/frontend && npm run devMIT License - Copyright (c) 2025 yayoboy
See LICENSE file for details.
- yayoboy - Initial work - esoglobine@gmail.com
- Built with FastAPI - Modern Python web framework
- React dashboard using Vite - Lightning-fast frontend tooling
- MCP integration via Model Context Protocol
- Process management inspired by supervisor and systemd patterns
- WebSocket support via FastAPI WebSockets
- Task classification inspired by semantic analysis techniques
For issues, questions, or suggestions:
- GitHub Issues: https://github.com/yourusername/oxide/issues
- Email: esoglobine@gmail.com
- SQLite database for task storage
- Advanced metrics and analytics
- Cost tracking per service
- Rate limiting and quotas
- Multi-user support
- Docker deployment
- Plugin system for custom adapters
- Workflow automation (task chains)
- A/B testing framework
- Performance benchmarking suite
- Auto-scaling for parallel execution
Version: 0.1.0 Status: β Production Ready - MVP Complete!
- Project structure and dependencies
- Configuration system
- Task classifier
- Task router with fallbacks
- Adapter implementations (Gemini, Qwen, Ollama)
- MCP server integration
- Web UI dashboard (React + FastAPI)
- Real-time monitoring and WebSocket
- Task executor in Web UI
- Task assignment manager (routing rules UI)
- Persistent task storage
- Process lifecycle management
- Test suite (process cleanup, task storage)
- Comprehensive documentation
- Production deployment guides
- Docker containerization
- Extended test coverage
Built with β€οΈ for intelligent LLM orchestration
Last Updated: December 2025