[Feature]: Consider integrating SVDquant (W4A4 quantization) from Nunchaku project

### 🚀 The feature, motivation and pitch

I noticed the Nunchaku project (https://github.com/mit-han-lab/nunchaku) has implemented SVDquant, which seems highly compatible with LLM scenarios, particularly their W4A4 quantization approach. This looks very interesting and promising for model optimization.

Would  Aphrodite Engine consider supporting or integrating this quantization method? It could potentially offer significant benefits for memory efficiency while maintaining model performance in LLM serving scenarios.

The Nunchaku project’s implementation appears to be well-suited for LLM use cases, and integration could be valuable for the Aphrodite Engine community.

### Alternatives

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Consider integrating SVDquant (W4A4 quantization) from Nunchaku project #1092

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Consider integrating SVDquant (W4A4 quantization) from Nunchaku project #1092

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions