Packet-Switched Attention for stable 2-bit quantized MoE inference, with variance-aware routing and Protocol C benchmarks.
flax quantization pallas fine-tuning efficient-inference tpu jax xla mixture-of-experts memory-optimization sparse-attention opensource-ai gemma-2 llm-efficiency protocol-c
-
Updated
May 14, 2026 - Python