Skip to content

[Feature]: Consider integrating SVDquant (W4A4 quantization) from Nunchaku project #1092

@dengyingxu

Description

@dengyingxu

🚀 The feature, motivation and pitch

I noticed the Nunchaku project (https://github.com/mit-han-lab/nunchaku) has implemented SVDquant, which seems highly compatible with LLM scenarios, particularly their W4A4 quantization approach. This looks very interesting and promising for model optimization.

Would Aphrodite Engine consider supporting or integrating this quantization method? It could potentially offer significant benefits for memory efficiency while maintaining model performance in LLM serving scenarios.

The Nunchaku project’s implementation appears to be well-suited for LLM use cases, and integration could be valuable for the Aphrodite Engine community.

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions