Fix file2file int4/uint4 packing to match advertised pack_method="reorder" by jin-amd · Pull Request #30 · amd/Quark

jin-amd · 2026-04-29T10:58:07Z

Summary

_quantize_and_save_safetensor_shard (the worker behind ModelQuantizer.direct_quantize_checkpoint) packs int4/uint4 weights with the linear nibble layout, while the same function unconditionally writes pack_method="reorder" (AWQ-interleaved) into the exported config.json. Loaders that respect that field — vLLM's compressed_tensors_w4a16 and Quark's own qparamslinear — call the AWQ-interleaved unpacker on linearly-packed bytes and silently scramble every nibble. Models load cleanly and emit garbage.

This PR flips the one offending positional arg to reorder=True, making the bytes match the metadata.

Why this is safe

Only Pack_4_bits actually honors reorder. Every other Pack class in quark/torch/utils/pack.py (Pack_2/3/8_bits, Pack_mxfp4/6, Pack_fp4/6) accepts the kwarg but ignores it. So FP8, MXFP4/6, FP4/6, INT8/UINT8, INT2/INT3 weights produced by file2file are bit-for-bit unchanged.
The file2file path was the lone outlier. Every other Quark export entry point already defaults to reorder=True: export/api.py::export_safetensors, ExporterConfig.pack_method, native_model_info_builder.py, qparamslinear.py, ModelQuantizer.export_*, and experimental/cli/torch_llm_ptq.py.
The exported config already promises pack_method="reorder" (file2file_quantization.py:863). This PR makes the bytes match the metadata; the reverse fix would diverge from every other Quark export and break the documented pack_method schema in export/api.py:138-140.
Existing unit tests already cover both layouts. test/test_for_torch/test_pack_unpack_tensor.py::test_pack_unpack round-trips both reorder=True and reorder=False for int4/uint4/int8/uint8. CI passes unchanged.

Empirical evidence

Kimi-K2.5 quantized via quantize_quark.py --quant_scheme w_int4_per_group_sym --file2file_quantization, evaluated on gsm8k (100 samples) under vLLM's compressed_tensors_w4a16:

Bytes from…	gsm8k
current `release/0.11`	0 / 100 (garbage)
same bytes manually repacked with `reorder=True` (i.e. what this PR produces)	94 / 100

The weights themselves were correct — only the on-disk nibble order was wrong.

Migration impact

This is a bug fix that ships a byte-layout change. Pre-existing int4/uint4 checkpoints from direct_quantize_checkpoint were already being mis-loaded by pack_method-aware loaders; after this PR they continue to load without error but still produce garbage until regenerated.

Notes for reviewers

I targeted release/0.11 because that's where I observed the bug, but release/0.10 looks identical and the same fix should apply.

…rder" `direct_quantize_checkpoint` was calling `pack_method.pack(weight, False)` (linear nibble order [0,1,2,3,4,5,6,7]) while the exported config.json unconditionally advertises `pack_method="reorder"` (AWQ-interleaved [0,2,4,6,1,3,5,7]). Downstream loaders that respect pack_method (vLLM's compressed_tensors_w4a16, Quark's own loaders) therefore unpacked these weights with the AWQ-interleaved unpacker and scrambled every nibble, producing models that load cleanly and emit garbage. This change flips that one call to `reorder=True`, aligning file2file with every other Quark export path (`export/api.py`, `qparamslinear.py`, `ModelQuantizer.export_*`, `experimental/cli/torch_llm_ptq.py`) and making the bytes match the config. Only `Pack_4_bits` honors the `reorder` flag, so this is a no-op for FP8, MXFP4/6, FP4/6, INT8/UINT8, INT2/INT3 weights produced by the file2file path. Existing `test_pack_unpack_tensor.py::test_pack_unpack` already covers both layouts and continues to pass; a new `test_file2file_pack_layout.py` pins the end-to-end contract. Migration: pre-existing int4/uint4 checkpoints produced by this path were already wrong against pack_method-aware loaders and must be regenerated. See release notes.

jin-amd marked this pull request as draft April 29, 2026 11:01

jin-amd marked this pull request as ready for review April 29, 2026 11:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix file2file int4/uint4 packing to match advertised pack_method="reorder"#30

Fix file2file int4/uint4 packing to match advertised pack_method="reorder"#30
jin-amd wants to merge 1 commit into
amd:release/0.11from
jin-amd:fix/file2file-int4-pack-reorder

jin-amd commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jin-amd commented Apr 29, 2026

Summary

Why this is safe

Empirical evidence

Migration impact

Notes for reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant