Bash Execution Logs:
[ma-user ds4]$./ds4 -m ~/work/model/antirez/deepseek-v4-gguf/DeepSeek-V4-Flash-Layers37-42Q4KExperts-OtherExpertLayersIQ2XXSGateUp-Q2KDown-AProjQ8-SExpQ8-OutQ8-chat-v2-imatrix-fixed
.gguf --cuda -p "LLM推理有哪些阶段"
ds4: context buffers 751.71 MiB (ctx=32768, backend=cuda, prefill_chunk=2048, raw_kv_rows=2304, compressed_kv_rows=8194)
ds4: CUDA backend initialized on Tesla V100-PCIE-32GB (sm_70)
ds4: CUDA registered 90.89 GiB model mapping for device access
ds4: CUDA startup model cache prepared 90.88 GiB of tensor spans in 0.000s
ds4: cuda backend initialized for graph diagnostics
ds4: prompt processing failed: cuda prefill failed
Bash Execution Logs: