Prefill Failed on V100, ANY IDEA?

Bash Execution Logs:

```bash
[ma-user ds4]$./ds4 -m ~/work/model/antirez/deepseek-v4-gguf/DeepSeek-V4-Flash-Layers37-42Q4KExperts-OtherExpertLayersIQ2XXSGateUp-Q2KDown-AProjQ8-SExpQ8-OutQ8-chat-v2-imatrix-fixed
.gguf --cuda -p "LLM推理有哪些阶段"
ds4: context buffers 751.71 MiB (ctx=32768, backend=cuda, prefill_chunk=2048, raw_kv_rows=2304, compressed_kv_rows=8194)
ds4: CUDA backend initialized on Tesla V100-PCIE-32GB (sm_70)
ds4: CUDA registered 90.89 GiB model mapping for device access

ds4: CUDA startup model cache prepared 90.88 GiB of tensor spans in 0.000s
ds4: cuda backend initialized for graph diagnostics
ds4: prompt processing failed: cuda prefill failed
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prefill Failed on V100, ANY IDEA? #246

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Prefill Failed on V100, ANY IDEA? #246

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions