[Bug]: FLASHINFER - chunked_prefill crashes when multiple concurrent requests happen

### Your current environment

In a single user mode with a single request, chunked prefill will work on FLASHINFER and I am able to hit 160k FP8 context.

When multiple concurrent requests come in, it crashes saying its not supported with FLASHINFER.

However, without chunked prefill my 132k FP8 context goes down to 15k FP8 context, making flashinfer useless to me.

With FLASH ATTENTION 2 I can hit over 60k FP16 context, but cannot use FP8 because its not supported on FLASH ATTENTION 2.

Is there any way to get FLASHINFER and chunked_prefill fixed? Or get quant k,v cache supported on FLASH ATTENTION 2?

Thank you!

### Model Input Dumps

.

### 🐛 Describe the bug

.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: FLASHINFER - chunked_prefill crashes when multiple concurrent requests happen #1279

Your current environment

Model Input Dumps

🐛 Describe the bug

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: FLASHINFER - chunked_prefill crashes when multiple concurrent requests happen #1279

Description

Your current environment

Model Input Dumps

🐛 Describe the bug

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions