A small, readable self-improving agent loop.
Observe, Orient, Decide, Act - and then the second A, Adjust. The phase that makes an agent get smarter across runs instead of only faster within one.
Part of Automatiqa Lab - open-source experiments where operations meet the algorithm.
Boyd's OODA loop was built for fighter pilots. Observe, Orient, Decide, Act, repeat - whoever cycles faster controls the engagement. It is the right model for a human who learns between cycles without thinking about it.
Agents do not. An agent that runs OODA makes faster decisions, not better ones - it repeats the same mistake at machine speed because nothing carries the lesson forward. The fix is one more phase. After acting, Adjust: score what happened and update how you decide next time. OODA gets faster. OODAA gets smarter. That difference is the whole repo.
The argument in long form: Boyd's Loop Was Built for Pilots, Agentic AI Needs OODAA. This is that essay made runnable - small enough to read in one sitting, real enough to build on.
No API key needed. A 4x4 grid, an agent that starts knowing nothing:
git clone https://github.com/automatiqa-lab/oodaa.git
cd oodaa
python examples/offline_grid.pyepisode steps (lower is smarter)
1 69
2 31
3 14
30 6
60 6
first 5 episodes averaged 27.2 steps
last 5 episodes averaged 6.0 steps
shortest possible path is 6 steps
Episode one wanders for 69 steps. By the end it walks the shortest path. The only thing that changed is what Adjust wrote to memory. Nothing in the loop is clever; the learning is a single line in oodaa/memory.py.
The loop is domain-neutral. It knows nothing about grids or shipments - a Task supplies the world, the loop supplies the cycle.
| Phase | What it does | Where it lives |
|---|---|---|
| Observe | Ask the task what the world looks like now | Task.observe() -> state.py Observation |
| Orient | Read what memory already knows about this situation | loop.py _known, memory.py |
| Decide | Pick an action, conditioned on memory | policy.py Policy.decide |
| Act | Run the action, get an outcome | Task.act(), optional executor.py |
| Adjust | Fold the outcome back into memory | memory.py Memory.update |
loop.py is the file to open first. Five blocks, five phases, one screen.
from oodaa import Loop, Policy, Memory
loop = Loop(policy=Policy(epsilon=0.2), memory=Memory(lr=0.5))
result = loop.run(my_task) # run it again and it decides betteroodaa was built for the automati.qa lab, where the loops drive operational decisions. examples/ops_triage.py runs the same machinery on a stream of incidents - a TMS shipment past its SLA, an ERP invoice blocked on a price variance, a WMS count gone negative. Each has a right first response the agent has to learn.
python examples/ops_triage.pyfirst 20 incidents: 50% right
last 20 incidents: 95% right
When the agent meets an incident type it has not seen, it can ask a language model what to try (the explorer in oodaa/llm.py). Every outcome feeds Adjust, so the next time that incident shows up, memory answers and the model is not needed. The example runs offline by default; wire any model in through from_completion and only the cold-start guess changes.
- The core is the standard library. No Redis, no database, no service. Clone and run.
- The model is a seam, not a dependency. The loop never imports a model.
llm.pygives you an offline explorer and afrom_completionseam that wraps any LLM in one line; either plugs into the policy, and a failed model call degrades to random exploration rather than taking the loop down. - Memory is plain and inspectable. A running value per (situation, action), plus hypotheses you can attach. Print it and read what the loop believes. No black box.
- Adjust is the point. Everything else is scaffolding around the one phase that learns.
pip install -e . # core, zero dependencies
pip install -e ".[dev]" # add pytest
pytestThis is the teaching-sized version. The same idea scales up - utility scoring, replay, a background review that proposes what to keep, gated self-modification - once you make Adjust do real work. Start here for the shape; grow each phase as your problem demands.
MIT. See LICENSE.