Operational monitor for multi-agent AI. Per-action observations, per-agent trends, baseline-relative alarms, named recommendations.
Features • Installation • Quick Start • Components • Rules • Adapters • Scope
agent-logging-system watches a fleet of AI agents the way an industrial control room watches a plant. One Observation per action goes in. A rolling per-agent window tracks the trend. An AnomalyDetector trips named alarms when behavior drifts from each agent's own baseline. A RecommendationEngine maps each alarm to a concrete action.
The design is borrowed from OT. AVEVA PI holds every sensor reading. WinCC raises an alarm when a value drifts. IBM Maximo maps that alarm to a maintenance procedure. An operator reads the trend, not the snapshot, and catches a failing pump bearing weeks before it seizes. This tool maps that discipline onto AI agents. Standard library only. No runtime dependencies.
- Per-action
Observationschema: timestamp, agent_id, action, input, output, latency_ms, status, confidence, latency_kind - Rolling 20-observation window per agent, tracked by
StateModel - Baseline-relative alarms: each agent is compared against its own normal, not a fixed line
- Two latency kinds (
machine,generation). A long generation never trips the machine alarm at any magnitude - Three default rules:
latency_high,error_rate_high,queue_buildup RecommendationEnginemaps every alarm to a named fix (throttle_input,investigate_failures)BaseAdapterto wire any host system in. Orchestrator and Warrant adapters ship- Custom rules via
AnomalyRule(name, check, alert_level, recommendation) - O(1) ingest, incremental scan (~15-27 us for 5 agents)
- Stdlib only. No database, no file on disk, no external collector
pip install agent-logging-systemOr from source:
git clone https://github.com/nuclide-research/agent-logging-system
cd agent-logging-system
pip install -e . # editable install
pip install -e ".[dev]" # plus test depsPython 3.9 or later.
from agent_logging_system import LoggingAgent, Observation
logger = LoggingAgent()
logger.ingest(Observation(
timestamp="2026-06-02T14:32:00Z",
agent_id="worker-001",
action="api_call",
input={"query": "example"},
output={"result": "ok"},
latency_ms=1200,
status="success",
confidence=0.95,
))
state = logger.get_system_state()
print(state["anomalies"])
print(state["recommendations"])| Component | OT analogue | Responsibility |
|---|---|---|
Observation |
one sensor reading | structured unit a worker emits: timestamp, agent_id, action, input, output, latency_ms, status, confidence, latency_kind |
StateModel |
historian + operator model | rolling window (20 obs), trends, error rate, machine vs generation latency counts |
AnomalyDetector |
WinCC alarm engine | evaluates threshold rules over state snapshot |
RecommendationEngine |
operator response procedure | maps alarm name to concrete action |
LoggingAgent |
control-room console | ingest + get_system_state; incremental scan |
adapters/ |
wiring to the plant | bind into an orchestrator, coding agent, or custom loop |
Observation(
timestamp: str, # ISO8601, e.g. "2026-06-02T14:32:00Z"
agent_id: str, # unique agent identifier
action: str, # api_call, computation, decision, error, ...
input: Any, # input to the action
output: Any = None, # output of the action
latency_ms: float = 0.0, # duration in ms (meaning set by latency_kind)
status: str = "success", # success | retry | failed | timeout
confidence: float = 1.0, # 0.0 to 1.0
error_details: Optional[Dict[str, str]] = None,
latency_kind: str = "machine", # "machine" | "generation"
)latency_ms carries two different quantities, distinguished by latency_kind:
| kind | what it is | high means | feeds latency_high alarm? |
|---|---|---|---|
"machine" (default) |
execution time of a call | pathological, contended | yes |
"generation" |
wall-clock of producing a large output | usually expected | no, ever |
A 9-second API call that should take 1 second and a 9-second explanation meant to be long are not the same event. A duration tagged generation can never trip the machine alarm. "machine" is the default: an unclassified duration is treated as alarmable.
from agent_logging_system import Observation, LATENCY_GENERATION
Observation(
timestamp="...", agent_id="explainer-001", action="explain",
input={...}, output={...}, latency_ms=12000,
latency_kind=LATENCY_GENERATION, # 12s is fine; never alarms
)| Rule | Level | Trips when | Recommends |
|---|---|---|---|
latency_high |
HIGH | machine-kind recent avg exceeds agent's own baseline by 3x, above a 100ms floor, after 4-sample warmup | throttle_input |
error_rate_high |
MEDIUM | error rate over 10% | investigate_failures |
queue_buildup |
LOW | over 10 observations and error rate under 5% | signal only |
latency_high is baseline-relative. A steady-but-slow agent does not cry wolf. A 1000ms-to-9000ms jump still trips.
{
"agents": {
"worker-001": {
"agent_id": "worker-001",
"status": "in_progress",
"avg_latency": 1200.0,
"recent_avg_latency": 1200.0,
"baseline_latency": 0.0,
"machine_recent_avg_latency": 1200.0,
"machine_baseline_latency": 0.0,
"machine_observations": 1,
"generation_observations": 0,
"error_count": 0,
"error_rate": 0.0,
"total_observations": 1,
...
}
},
"anomalies": [
{"name": "latency_high", "alert_level": "HIGH",
"recommendation": "...", "agent_id": "worker-001"}
],
"recommendations": [
{"action": "throttle_input", "priority": "HIGH",
"reason": "...", "agent_id": "worker-001"}
]
}Subclass BaseAdapter to wire a host system into the monitor. Two ship.
OrchestratorAdapter for an O->S->H subagent fan-out. Dispatches log per lane (retrieval.sonnet, execution.haiku) as machine latency. A lane builds a baseline and a slow batch trips. The orchestrator's synthesis turn logs as generation latency. A long integration never alarms.
from agent_logging_system import LoggingAgent
from agent_logging_system.adapters import OrchestratorAdapter
orch = OrchestratorAdapter(LoggingAgent())
orch.log_fanout(orch.EXECUTION, [("shard 1", 480), ("shard 2", 9000)]) # machine, per lane
orch.log_synthesis(18000) # generation, never alarms
print(orch.get_state()["anomalies"])WarrantAdapter for a book-grounded coding agent. Reasoning and code generation log as generation latency. Citation checks log as machine latency: a slow citation lookup is a real signal.
from agent_logging_system.anomaly_detector import AnomalyRule
logger.add_anomaly_rule(AnomalyRule(
name="confidence_collapse",
check=lambda s: s.get("error_rate", 0) > 0.25,
alert_level="HIGH",
recommendation="Pause agent and review recent inputs",
))ingest is O(1). get_system_state re-evaluates only agents that changed since the last scan.
| Operation | Cost |
|---|---|
ingest |
~2 us |
get_system_state |
~15 to 27 us for 5 agents |
pytestSeven test files cover the logging agent, observation schema, anomaly detector, state model, and orchestrator adapter.
python examples/basic_multi_agent.py # three agents degrading at different rates
python examples/warrant_integration.py # Warrant adapter: reasoning, code, citation logs
python examples/orchestrator_integration.py # O->S->H fan-out: slow lane alarms, long synthesis does not
python examples/session_replay.py # replay a session transcript through the monitor
python examples/self_monitor.py # the monitor watching itself, with real perf timingsagent-logging-system is not a tracer, a profiler, or a logging framework. It does not capture stack frames, record every line of output, or send data to an external collector. It holds no persistent state: no database, no file on disk. It receives Observation structs your code constructs, evaluates rules over rolling per-agent windows, and returns a plain dict. It stops there.
- aimap — fingerprint scanner for exposed AI and ML infrastructure
- nuclide-atlas — see any LLM stack as a graph
- safety-stream — live SSE view of a model's layered safety reasoning
- VisorLog — finding ledger and ingest pipeline
- BARE — semantic exploit-module ranking over scanner findings
MIT. Part of the NuClide toolchain. Contact: nuclide-research.com