Skip to content

Add --gguf_file support for exporting GGUF models to ExecuTorch#223

Merged
larryliu0820 merged 1 commit intohuggingface:mainfrom
abhinaykukkadapu:support_gguf_conversion
Mar 19, 2026
Merged

Add --gguf_file support for exporting GGUF models to ExecuTorch#223
larryliu0820 merged 1 commit intohuggingface:mainfrom
abhinaykukkadapu:support_gguf_conversion

Conversation

@abhinaykukkadapu
Copy link
Contributor

@abhinaykukkadapu abhinaykukkadapu commented Mar 18, 2026

This adds a --gguf_file CLI argument that lets users export GGUF models to .pte directly, without manually converting to HF format first. The GGUF file is dequantized to float by Transformers' existing loader, then re-quantized via torchao for the target backend.

Test

optimum-cli export executorch -m TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF --gguf_file tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf --task text-generation --recipe xnnpack -o /tmp/gguf_test/

@abhinaykukkadapu abhinaykukkadapu force-pushed the support_gguf_conversion branch 2 times, most recently from 20bfe21 to 9c5c771 Compare March 18, 2026 22:04
Many popular quantized models on Hugging Face Hub are distributed only
as GGUF files. This adds a --gguf_file CLI argument that lets users
export GGUF models to .pte directly, without manually converting to HF
format first. The GGUF file is dequantized to float by Transformers'
existing loader, then re-quantized via torchao for the target backend.

Warns users that GGUF dequantization loads weights as float32, which
increases peak memory, and suggests --dtype float16 to mitigate.

Also bumps nightly dependency pins in install_dev.py — the previous
executorch==1.1.0.dev20260104 version doesn't exist on PyPI.
@abhinaykukkadapu abhinaykukkadapu force-pushed the support_gguf_conversion branch from 9c5c771 to 6260447 Compare March 18, 2026 23:13
@larryliu0820 larryliu0820 merged commit 585799c into huggingface:main Mar 19, 2026
65 of 85 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants