AgentOS is a spec-first, multi-tenant, federated agent platform built by NexixAI.
This repository is intentionally documentation-led. Specs are the source of truth; code exists to prove contracts and execution paths, not to outrun the design.
- Agent Orchestrator (Orchestration) — run orchestration, event streaming, lifecycle
- Model Policy (Model Services) — model invocation, embeddings, policy
- Federation Layer — connect AgentOS nodes across environments
- Operator UX — one-command deploy, validate, redeploy, nuke
- Spec Authority — PRS, schemas, OpenAPI, conformance tests
- Shared Internal Packages — auth, quota, storage, metrics, audit
- Spec-first, code-second
- Additive-only
/v1APIs - Multi-tenant by default
- Federation as a first-class primitive
- CI-enforced contracts
- Docker-network invariants for multi-service tests (no localhost from CI host)
- RFC-0001 AI JCL discipline for AI-assisted development
- Model A baseline: one shared Agent Orchestrator + one shared Model Policy with logical tenancy enforced via
tenant_id, policy/entitlements, quotas, scoped storage, and audit. - Model B (per-tenant pairs) is optional deployment only.
- Control plane remains compute-light; workers run heavy models and emit events back to Agent Orchestrator. Workers never make orchestration or policy decisions.
Box 1 — Control Plane
[Agent Orchestrator] [Model Policy] [Queue/Event Bus]
[UI/API Gateway] [Shared artifact store (/shared)]
[Optional small local LLM for planning/orchestration]
Box 2/3 — Worker Nodes (GPU)
[Worker executors + tools + storage] -> emits events to Agent Orchestrator
[GPU-pinned model endpoints: vLLM/TGI/Triton]
/README.md
/SPEC_AUTHORITY.md
/AI_CONTRACT.md
/AGENTS.md
/docs/
/docs/api/ (OpenAPI specs)
/docs/design/v1.02/ (canonical design)
/docs/product/agentos-prs/v1.02/ (canonical PRS & schemas)
/docs/plan/v1.02/ (canonical plan docs: index, phases, tracks)
/docs/rfc/ (RFCs including AI JCL)
/cmd/agentos/ (CLI entry point)
/agentorchestrator/ (Agent Orchestrator service)
/modelpolicy/ (Model Policy service)
/federation/ (Federation service)
/internal/ (shared packages: auth, quota, storage, metrics, audit, etc.)
/tests/ (conformance + E2E tests)
/deploy/ (Docker Compose configs)
/configs/ (phase tracking)
/.github/ (CI workflows)
- Product Requirements:
docs/product/agentos-prs/v1.02/prs.md - JSON Schemas:
docs/product/agentos-prs/v1.02/schemas-appendix.md - Design:
docs/design/v1.02/agentos-design.md - Spec Authority:
SPEC_AUTHORITY.md - OpenAPI Specs:
docs/api/*/openapi.yaml
- Docker + Docker Compose v2
- Go toolchain (only required to build the CLI)
go build -o agentos ./cmd/agentos
# Windows:
go build -o agentos.exe ./cmd/agentos./agentos up
./agentos validate
./agentos status- Agent Orchestrator: http://127.0.0.1:50081/v1/health
- Model Policy: http://127.0.0.1:50082/v1/health
- Federation: http://127.0.0.1:50083/v1/federation/health
Windows note: use curl.exe -4 http://127.0.0.1:PORT/... (PowerShell curl is Invoke-WebRequest). Ports in the 808x range can be reserved/excluded on Windows; local files use 5008x to avoid that.
Automated (recommended):
# Unix/macOS
./scripts/smoke-local.sh
# Windows (PowerShell)
.\scripts\smoke-local.ps1The smoke scripts include retries with bounded timeouts and force IPv4 for reliability.
Manual:
# Create a run (uses default tenant tnt_demo)
curl -X POST http://127.0.0.1:50081/v1/agents/my-agent/runs \
-H "X-Tenant-Id: tnt_demo" \
-H "Content-Type: application/json" \
-d '{"input": "hello"}'
# Check health
curl http://127.0.0.1:50081/v1/health
# List models
curl -H "X-Tenant-Id: tnt_demo" http://127.0.0.1:50082/v1/modelsGPU-pinned placeholders; control plane stays CPU-bound:
COMPOSE_PROFILES=workers docker compose -f deploy/local/compose.yaml up -d --build./agentos nuke
./agentos nuke --hard # also removes volumes (destructive)Keep running:
docker compose \
-f deploy/local/compose.federation-2node.yaml \
-f deploy/local/compose.federation-2node.ADDENDUM.yaml \
up -d --builddocker compose \
-f deploy/local/compose.federation-2node.yaml \
-f deploy/local/compose.federation-2node.ADDENDUM.yaml \
up --build --abort-on-container-exit federation-e2eInvariant: All multi-service tests run inside Docker and use Docker DNS names (e.g. nodea-federation, nodeb-federation). CI must never curl localhost for live services.
docker compose \
-f deploy/local/compose.federation-2node.yaml \
-f deploy/local/compose.federation-2node.ADDENDUM.yaml \
down -vAll requests execute within exactly one tenant_id. Pass tenant context via:
- Header:
X-Tenant-Id: tnt_your_tenant - JWT claim:
tenant_idortid
Default tenant tnt_demo is seeded automatically for local development.
Per-tenant enforcement includes:
- Rate limiting (QPS caps)
- Concurrent run limits
- Scoped storage isolation
- Audit logging with tenant context
Key configuration options:
| Variable | Description | Default |
|---|---|---|
AGENTOS_DEFAULT_TENANT |
Seeded default tenant | tnt_demo |
AGENTOS_QUOTA_RUN_CREATE_QPS |
Run creation rate limit | 10 |
AGENTOS_QUOTA_CONCURRENT_RUNS |
Max concurrent runs per tenant | 25 |
AGENTOS_QUOTA_INVOKE_QPS |
Model invocation rate limit | 20 |
AGENTOS_AUDIT_SINK |
Audit destination (stdout, stderr, file:PATH) |
file:data/audit/... |
If Docker Desktop's credential helper breaks (common on macOS/Windows), you may see errors like:
error getting credentials - err: exec: "docker-credential-desktop": executable file not found
Quick fix (bypass credential helper for local builds):
- Edit
~/.docker/config.json - Remove or comment out the
credsStoreline:{ "credsStore": "desktop" // <-- remove this line } - Retry
docker compose up --build
Alternative (reset Docker Desktop):
- Docker Desktop → Troubleshoot → Reset to factory defaults
- Re-authenticate with
docker loginif needed
If you see "port already in use" errors, another process may be binding to 5008x ports:
# Find what's using the port (Unix/macOS)
lsof -i :50081
# Windows (PowerShell)
netstat -ano | findstr :50081Stop the conflicting process or change the compose port mappings.
If health checks fail after ./agentos up:
- Check container status:
docker compose -f deploy/local/compose.yaml ps - Check logs:
docker compose -f deploy/local/compose.yaml logs - Ensure you're using 127.0.0.1 (not localhost) to avoid IPv6 issues
- On Windows, use
curl.exe -4to force IPv4
This is a scaffold/proof-of-concept implementation. The following are intentionally stubbed:
Stubbed execution layers (v1.02 scope):
- Agent execution: Runs auto-progress through
queued→running→completedwithout real execution - Model providers: Returns stub responses; no real LLM integration
- Tool execution: Not implemented
- Memory/KV store: Not implemented
- Persistence: File-based JSON storage (not production-grade)
Missing enforcement (tracked for completion via v1.02 gap-closure tracks):
- Idempotency key enforcement: API accepts
idempotency_keybut doesn't enforce deduplication → Track 05 - Run cancellation:
POST /v1/runs/{run_id}:cancelnot implemented → Track 06 - Agent metadata endpoints:
GET /v1/agentsnot implemented → Track 07 - Policy engine logic: Policy check framework present but returns allow-all → Track 08
- Federation mTLS: Uses HTTP with bearer tokens; mTLS not configured → Track 09
These stubs demonstrate the architecture and API contracts. Production implementations would replace the stub adapters. Gap-closure tracks (5-9) implement missing enforcement of v1.02 PRS requirements.
When CI is green:
- OpenAPI + schema conformance has passed
- Federation E2E has passed in Docker
- The platform is deployable in its current form
CI green does not imply production hardening beyond Phase 15.
Version: v1.02 Execution: Phase 16 complete (release readiness) CI: Conformance + Federation E2E passing