Skip to content

[quantization] Implement tie_word_embeddings #624

@stamalakhov

Description

@stamalakhov

What

Some models like unsloth/Llama-3.2-3B-Instruct use tie_word_embeddings, so lm_head weights are just a clone of input_embedings. But in export this flag is ignored (no known ways to share weights of input_embeddings and lm_head in export to circle (please correct me if i'm wrong)). So to fit to 2Gb of current size constraint of circle file we need to quantize lm_head to 4 bits (checked - flatbuffers: cannot grow buffer beyond 2 gigabytes), which decreases accuracy significantly. It would be nice just to share quantized lm_head weights with input_embeddings in circle.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions