Skip to content

Prefill Failed on V100, ANY IDEA? #246

@the-butterfly

Description

@the-butterfly

Bash Execution Logs:

[ma-user ds4]$./ds4 -m ~/work/model/antirez/deepseek-v4-gguf/DeepSeek-V4-Flash-Layers37-42Q4KExperts-OtherExpertLayersIQ2XXSGateUp-Q2KDown-AProjQ8-SExpQ8-OutQ8-chat-v2-imatrix-fixed
.gguf --cuda -p "LLM推理有哪些阶段"
ds4: context buffers 751.71 MiB (ctx=32768, backend=cuda, prefill_chunk=2048, raw_kv_rows=2304, compressed_kv_rows=8194)
ds4: CUDA backend initialized on Tesla V100-PCIE-32GB (sm_70)
ds4: CUDA registered 90.89 GiB model mapping for device access

ds4: CUDA startup model cache prepared 90.88 GiB of tensor spans in 0.000s
ds4: cuda backend initialized for graph diagnostics
ds4: prompt processing failed: cuda prefill failed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions