Packet-Switched Attention for Stable 2-Bit Quantization
[!] STATUS: ACTIVE DEVELOPMENT [!] ARCH: HARDWARE-NATIVE HYBRID (CONV1D + SPARSE ATTN) [!] TARGET: COMMODITY GPU (T4/RTX) & CLOUD TPU (v5e)
Precision through architecture, not parameter count.
UnSwag addresses stability challenges in 2-bit quantized mixture-of-experts models through Packet-Switched Attention (PSA). By discretizing token processing into semantic routing packets, UnSwag focuses compute only where it matters—ignoring structural noise and maintaining numerical stability on commodity hardware.
Monitors input correlation patterns and applies orthogonal phase corrections to reduce routing instability in quantized space. The goal is to keep variance bounded during aggressive 2-bit routing on memory-constrained devices.
Solves: The "correlation blow-up" problem where similar input tokens create unstable routing distributions in quantized space.
A lightweight depthwise-separable CNN path that preserves local syntactic structure during aggressive quantization.
| Packet | Function | Performance |
|---|---|---|
01 |
Depthwise-Separable Convolutions (bypasses O(N²) attention) | Handles syntax at hardware speed |
10 |
Updates Adaptive Summary Register (O(1) memory) | Maintains sequence gist |
11 |
High-density semantic markers with Causal Sparse Attention | Links critical context |
00 |
High-confidence noise, pruned from KV-Cache | ~40% memory reduction |
Progressive error correction that refines quantization residuals across routing passes, analogous to vector quantization in audio codecs.
| Metric | Protocol C (PSA) | Standard Attention |
|---|---|---|
| Pruning Rate (00) | ~13.8% | 0.0% |
| Attention Density (11) | ~25.0% | 100.0% |
| Variance Stability | 0.255 (ARMen Guard active) | N/A |
| Router Gradient Flow | ✅ Gumbel-Softmax | N/A |
Verified Speedup: 6.31x over dense baseline (0.74ms vs 4.71ms per pass at 10% density).
UnSwag supports multiple hardware targets through a unified API:
| Protocol | Target | Math | Engine |
|---|---|---|---|
| Protocol C (CURRENT) | All Hardware | 2-Bit Semantic Routing + Variance Stabilization | Hybrid Conv1D / Sparse Attention |
| Protocol A (GPU) | NVIDIA T4, A100, H100 | 2-Bit SiLU Isomorphism | Custom Triton v3 Kernels |
| Protocol B (TPU) | Google TPU v3, v4, v5e | 1-Bit ReLU Isomorphism | JAX / Pallas / XLA |
git clone https://github.com/augstentatious/UnSwagAI.git
cd UnSwagAI
pip install -e .python benchmark_proof.pyThe benchmark compares Protocol C's sparse pathway against a dense baseline. Reported speedups are hardware- and density-dependent; reproduce on the target GPU/TPU before treating the numbers as deployment claims.