Skip to content

Releases: ai-2070/l0-python

🏁 L0 (Python) v0.21.0 - Streaming Performance Overhaul, Guardrail Optimizations, and Drift Efficiency

10 Apr 22:33

Choose a tag to compare

This release is a major internal performance upgrade for the Python runtime.

No API changes — but substantial improvements to:

  • streaming efficiency (O(n²) → O(n))
  • guardrail execution cost
  • drift detection memory + speed
  • event + callback overhead

Net result: faster, more scalable streaming with lower overhead across the entire pipeline.


✨ Highlights

1. O(n) Token Accumulation (Major Performance Fix)

String concatenation during streaming has been replaced with a buffered approach.

Before:

state.content += token  # O(n²) over time

Now:

  • Tokens appended to _content_buffer
  • Joined lazily via a descriptor (_ContentDescriptor)
  • Flushed only when state.content is read

Result:

  • O(n) total complexity
  • Dramatically better performance for long streams
  • Reduced memory churn

2. Drift Detection: Sliding Window + Bounded Memory

Drift detection has been rewritten to avoid unbounded growth.

Changes:

  • listdeque(maxlen=N) for:
    • entropy tracking
    • token history
  • Only stores a window, not full content
  • Uses last_window instead of full last_content

Impact:

  • Stable memory usage
  • Faster drift checks
  • Better scalability on long-running streams

3. Guardrails: Significant Runtime Optimizations

JSON Guardrail

  • Adds is_json_content caching
  • Avoids repeated looks_like_json() calls
  • Resets cache correctly on stream resets

Markdown Guardrail

  • Skips all analysis during streaming
  • Only runs on completion

Pattern Guardrail (Major Change)

  • Precompiles all patterns into a single regex
  • Uses incremental scanning:
    • scans only new content (+ small overlap)
    • full scan only on completion

Result:

  • From repeated full scans → near O(delta)
  • Much lower overhead on large streams

4. Runtime Hot Path Optimizations

Callback Execution

  • Skips function calls when callbacks are None
  • Reduces overhead per token

Observability Events

  • Guardrail observability only runs if handlers exist
  • Avoids unnecessary timing + event construction

Buffer Reset Fixes

  • _content_buffer now cleared correctly on:
    • retries
    • checkpoint resets

5. Improved Checkpoint + State Handling

  • Ensures buffer and content stay in sync during:
    • retries
    • invalid checkpoint recovery
  • Prevents subtle duplication or stale state issues

6. Updated Benchmarks (Python 3.13)

Performance improvements reflected in benchmarks:

  • L0 Core: ~596K tokens/sec
  • Full Stack: ~114K tokens/sec
  • Lower overhead percentages across most scenarios

Still comfortably above real-world model throughput.


7. Documentation Updates

  • README now includes Python performance section
  • BENCHMARKS.md updated with latest numbers
  • WHITEPAPER.md significantly expanded

🧭 Upgrade Notes

  • No breaking changes
  • Fully backward compatible
  • Strongly recommended if you:
    • stream large outputs
    • use guardrails heavily
    • rely on drift detection
    • run long-lived pipelines

🔧 L0-Python v0.20.0 - Structured Retry Fixes + Canonical Runtime Alignment

03 Apr 19:11

Choose a tag to compare

L0 Python v0.20.0 fixes structured retry behavior for stream factories, improves canonical lifecycle and observability alignment with the TypeScript runtime.

⚙️ 1. Structured Stream Factory Retry Fix

  • Fixed structured retry behavior so stream factory functions are called fresh on each retry attempt.
  • This resolves cases where structured retries could accidentally reuse an already-consumed stream, leading to failures like:
    • locked/consumed stream errors
    • invalid retry behavior in structured()
    • invalid retry behavior in structured_array()
    • fallback retry reuse issues
  • Factory-based structured flows now retry correctly across:
    • normal retries
    • fallback retries
    • sync and async stream factories

📈 2. Canonical Lifecycle + Observability Alignment

  • Improved runtime parity with the canonical lifecycle and TypeScript event model.
  • Updates include:
    • FALLBACK_START now uses fromIndex / toIndex
    • retry attempts include isRetry
    • fallback attempt numbering resets correctly per fallback stream
    • error events now include richer recovery metadata
    • failure classification and recovery strategy mapping are now emitted more explicitly
  • Added new canonical lifecycle and network classification tests to lock this behavior in.

🛟 3. Network Error Classification Coverage

  • Added broader canonical tests for network error detection and classification.
  • Improves confidence around handling of:
    • connection drops
    • DNS failures
    • fetch/network request failures
    • timeout conditions
    • SSL-related failures
  • This strengthens parity between documented behavior and tested runtime behavior.

🗃️ 4. Documentation Updates

  • Updated docs across the project for correctness and consistency, including:
    • API documentation
    • lifecycle docs
    • custom adapter docs
    • multimodal docs
    • consensus docs
    • document window docs
    • guardrails docs
    • README usage fixes
  • Added a new WHITEPAPER.md describing L0 as a deterministic streaming execution substrate for AI.

🏎️ L0-Python 0.19.0 - Performance Improvements

13 Dec 23:58

Choose a tag to compare

This release introduces optimizations to our core drift detection logic and updates our event tracing system for better performance.


🚀 Performance Improvements

Drift detection has been significantly optimized by pre-compiling all regex patterns and removing repeated per-check compilation. This reduces overhead across tone, format, repetition, markdown, and hedging detection while preserving identical behavior. The changes are entirely internal but materially improve throughput under high-token streaming workloads.


🧭 Deterministic Callback IDs (UUIDv7)

Guardrail and observability callbacks now use UUIDv7-based IDs instead of UUIDv4. UUIDv7 is time-ordered and faster to generate, improving traceability and event ordering in high-concurrency and distributed systems while maintaining global uniqueness.


🔥 Benchmark Results

Test Environment

  • CPU: Apple M1 Max (10 cores)
  • Runtime: Python 3.13, pytest 9 with pytest-asyncio 1.3.0
  • Methodology: Mock token streams with zero inter-token delay to measure pure L0 overhead
Scenario Tokens/s Avg Duration TTFT Overhead
Baseline (raw streaming) 1,518,271 1.32 ms 0.02 ms -
L0 Core (no features) 551,696 3.63 ms 0.08 ms 175%
L0 + JSON Guardrail 469,922 4.26 ms 0.07 ms 223%
L0 + All Guardrails 367,328 5.44 ms 0.08 ms 313%
L0 + Drift Detection 119,758 16.70 ms 0.08 ms 1166%
L0 Full Stack 108,257 18.48 ms 0.07 ms 1301%

📦 Installation

pip install ai2070-l0
# or
pip install ai2070-l0[openai]
pip install ai2070-l0[litellm]

🙀 L0-Python 0.18.0 - Full Pydantic Model Suite

10 Dec 14:07
b5b77f1

Choose a tag to compare

This release delivers a complete Pydantic model export layer for every major L0 type.


✨ New: Full Pydantic Model Suite (l0.pydantic)

L0 now provides a complete Pydantic BaseModel mirror of every major internal dataclass.

You can now import Pydantic equivalents for:

  • Core types (StateModel, RetryModel, TimeoutModel, TelemetryModel, etc.)
  • Consensus models
  • Drift detection
  • Guardrails
  • Metrics snapshots
  • Parallel/race operations
  • Pipeline execution
  • Pool operations
  • Event sourcing + replay
  • Observability events
  • Windowing/document chunking

Example:

from l0.pydantic import StateModel, RetryModel, DriftResultModel

state = StateModel(content="hello", token_count=5)
json_data = state.model_dump_json()
schema = StateModel.model_json_schema()

This enables:

  • Typed JSON schemas for OpenAPI/SDKs
  • Runtime-safe structured logging
  • Interop with FastAPI / Litestar
  • Persisting structured observability events
  • Easier debugging & replay

📦 The new module contains over 1,500 lines of typed models, covering all L0 dataclasses.


📈 Benchmark Improvements

BENCHMARKS.md received several updates:

  • Updated environment to Python 3.13, pytest 9, and pytest-asyncio 1.3.0
  • Clarified methodology
  • Updated Nvidia Blackwell section
  • Added Python 3.14 performance note:
    Pydantic import overhead currently impacts async iteration speed by ~30% in Python 3.14; this appears to be a Pydantic compatibility issue, not a Python regression
  • Updated instructions for running benchmarks (now explicitly using Python 3.13)

🧩 Summary of Changes

Area Change
Pydantic Export Layer Full Pydantic BaseModel suite for all L0 types
README New Pydantic section + improvements
Benchmarks Updated environment, performance notes, 3.14 caveats, commands
Events Updated/expanded Pydantic event definitions
Testing New comprehensive Pydantic model tests

🎯 Why This Matters

This release lays the foundation for:

  • Strong typing across every L0 subsystem
  • First-class OpenAPI / schema-driven integrations
  • Richer tooling: dashboards, telemetry pipelines, logging processors
  • Fully typed observability + replay pipelines
  • Easier internal and external adapter development

L0 now provides one of the most complete type-model sets in the Python AI ecosystem.

🐍 Python v0.17.0 - High-Throughput Upgrade

08 Dec 22:40

Choose a tag to compare

The Python runtime for L0 receives the same performance-focused overhaul as the TypeScript version targeting Nvidia Blackwell support. This release introduces incremental JSON guardrails, sliding-window drift detection, new high-throughput defaults, and a brand-new benchmark suite demonstrating Python’s ability to sustain 120K+ tokens/sec.

This update includes major internal upgrades across guardrails and drift detection.


✨ Highlights

1. ⚡ Incremental JSON Guardrails (O(delta) cost)

json_rule() has been rewritten to match the new TS architecture:

  • New IncrementalJsonState dataclass
  • Tracks braces, brackets, string/escape state incrementally
  • Only processes delta (new characters), not full content
  • Full analyze_json_structure() executed only at stream completion
  • Automatic state reset on new/shortened streams

Result: ~5–10× faster per-token guardrail checks under streaming load.


2. 🎯 Sliding Window Drift Detection

DriftConfig now includes:

sliding_window_size: int = 500

Drift detection now:

  • Analyzes only the last N characters
  • Meta commentary, repetition, markdown collapse, tone shift all run on the window
  • Reduces drift-detection cost by O(content_length) → O(window_size)
  • Matches the TS implementation for cross-platform parity

3. 🚀 New High-Throughput Default Intervals

Python now uses the same optimized defaults as TS:

Interval Old New
Guardrails 5 tokens 15
Drift 10 tokens 25
Checkpoint 10 tokens 20

Updated in ADVANCED.md and CheckIntervals (src/l0/types.py).


4. 🧪 New Benchmark Suite (BENCHMARKS.md)

Full benchmarking added (99 additions):

  • Baseline vs core vs guardrails vs drift vs full-stack
  • Measured on Apple M1 Max with Python 3.13
  • Python achieves 1.5M tokens/sec raw iteration and 120K TPS full-stack with all guardrails enabled
  • Ready for 1000+ TPS Nvidia Blackwell inference loads

Benchmarks include reproducible pytest commands.


🗑️ Targeted Deletions / Optimization Removals

  1. Removed old full-content drift detection paths
  2. Removed malformed-pattern reporting in streaming phase (now done incrementally)
  3. Removed obsolete default interval values (5/10/10)
  4. Removed non-window-based drift comparisons to last full content

L0 for Python - Initial Release (Full Lifecycle + Event Compatibility)

08 Dec 04:19

Choose a tag to compare

This is the first release of L0 for Python, the deterministic execution substrate for reliable AI streaming - now with full lifecycle parity and event-type compatibility with the TypeScript implementation.

L0 provides the missing reliability layer for all AI streams: deterministic token delivery, retries, fallbacks, guardrails, drift detection, checkpoint resumption, network protection, and full observability - all transparently wrapped around any LLM provider stream.

This release is built for production workloads and ships with 1,800+ tests, real adapter integrations for OpenAI and LiteLLM (100+ providers), and a fully instrumented streaming runtime covering 25+ structured lifecycle events.


🔥 Key Highlights

Full Lifecycle Compatibility

The Python version now includes the complete deterministic lifecycle flow - retries, fallbacks, checkpoints, resume logic, guardrail phases, drift detection, tool-call phases, and completion flow identical in semantics to the TypeScript implementation.
All lifecycle callbacks (on_start, on_event, on_violation, on_retry, on_fallback, on_resume, on_timeout, etc.) are implemented and follow the same event order and guarantees.

🎛️ Central Event Bus with 25+ Structured Event Types

This release introduces the full observability and event-sourcing infrastructure:

  • SESSION_START, STREAM_INIT, ADAPTER_DETECTED
  • TIMEOUT_*, RETRY_*, FALLBACK_*
  • GUARDRAIL_*, DRIFT_*, CHECKPOINT_SAVED
  • TOOL_REQUESTED, TOOL_RESULT, TOOL_ERROR
  • SESSION_SUMMARY & SESSION_END

These events enable complete introspection, replay, debugging, supervision, and telemetry in production systems.

Deterministic Streaming Runtime

  • Token-by-token normalization
  • Timeout enforcement (initial + inter-token)
  • Checkpointing and last-known-good-token resumption
  • Drift detection & pattern-based guardrails
  • Network protection across 12+ failure patterns

🔁 Smart Retries & Fallbacks

  • Distinguishes model errors from network/transient errors
  • Sequential fallback chain with on_fallback telemetry
  • AWS-style fixed-jitter backoff by default
  • Full retry/fallback reasoning surfaced through lifecycle events

🧱 Structured Output with Automatic Repair

  • Native Pydantic integration
  • Corrects malformed JSON (missing braces, broken fences, trailing commas)
  • Guaranteed schema validity

🔌 Adapters

  • OpenAI adapter (auto-detected)
  • LiteLLM adapter (100+ providers)
  • Full API-compatible adapter protocol for custom providers

🧪 Battle-Tested

  • 1,800+ unit tests
  • 100+ integration tests simulating real streaming conditions

📦 Installation

pip install ai2070-l0
# or
pip install ai2070-l0[openai]
pip install ai2070-l0[litellm]

🏁 Quick Example

import asyncio
from openai import AsyncOpenAI
import l0

async def main():
    client = l0.wrap(AsyncOpenAI())

    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}],
        stream=True,
    )

    async for event in response:
        if event.is_token:
            print(event.text, end="", flush=True)

asyncio.run(main())