Summary
- Problem: When ds4-server is killed while a long-context conversation has KV cache persisted to disk, restarting the server with a stale KV cache triggers
Metal graph embed tokens failed: Internal Error (00000001:Internal Error). Instead of detecting the corrupted state and resetting, ds4 enters an infinite retry loop:
0525 21:04:12 ds4-server: chat ctx=81920..94031:12111 prompt start
0525 21:04:12 ds4-server: chat ctx=81920..94031:12111 prefill chunk 0/12111 (0.0%) chunk=0.00 t/s
ds4: Metal graph embed tokens failed: Internal Error (00000001:Internal Error)
0525 21:04:12 ds4-server: chat ctx=81920..94031:12111 prefill failed after stream closed: metal prefill failed
0525 21:04:12 ds4-server: chat ctx=81920..95221:13301 prompt start
(repeats indefinitely)
Steps to Reproduce
- Start ds4-server with
--ctx 1000000 --kv-disk-dir <path>
- Begin a conversation with enough context to fill the KV cache (e.g. 80K+ tokens)
- Kill the ds4-server process (
kill -9)
- Restart ds4-server without clearing the KV cache directory
- The next request will trigger
Metal Internal Error and the infinite retry loop
- Server becomes unresponsive and must be killed
Workaround
Clearing the KV cache directory resolves the issue:
Environment
- ds4 commit: b9305 (latest prebuilt binary)
- Hardware: Apple M5 Max, 128 GiB unified memory
- macOS 26.5
Suggested Fix
- Short-term: Add a retry limit — after 3 consecutive Metal prefill failures on the same context range, clear the context and report the error.
- Long-term: On startup, validate stale KV cache entries against the loaded model state and evict mismatched entries instead of crashing/re-looping.
Summary
Metal graph embed tokens failed: Internal Error (00000001:Internal Error). Instead of detecting the corrupted state and resetting, ds4 enters an infinite retry loop:Steps to Reproduce
--ctx 1000000 --kv-disk-dir <path>kill -9)Metal Internal Errorand the infinite retry loopWorkaround
Clearing the KV cache directory resolves the issue:
Environment
Suggested Fix