Skip to content

Fix file2file int4/uint4 packing to match advertised pack_method="reorder"#30

Open
jin-amd wants to merge 1 commit into
amd:release/0.11from
jin-amd:fix/file2file-int4-pack-reorder
Open

Fix file2file int4/uint4 packing to match advertised pack_method="reorder"#30
jin-amd wants to merge 1 commit into
amd:release/0.11from
jin-amd:fix/file2file-int4-pack-reorder

Conversation

@jin-amd
Copy link
Copy Markdown

@jin-amd jin-amd commented Apr 29, 2026

Summary

_quantize_and_save_safetensor_shard (the worker behind ModelQuantizer.direct_quantize_checkpoint) packs int4/uint4 weights with the linear nibble layout, while the same function unconditionally writes pack_method="reorder" (AWQ-interleaved) into the exported config.json. Loaders that respect that field — vLLM's compressed_tensors_w4a16 and Quark's own qparamslinear — call the AWQ-interleaved unpacker on linearly-packed bytes and silently scramble every nibble. Models load cleanly and emit garbage.

This PR flips the one offending positional arg to reorder=True, making the bytes match the metadata.

Why this is safe

  • Only Pack_4_bits actually honors reorder. Every other Pack class in quark/torch/utils/pack.py (Pack_2/3/8_bits, Pack_mxfp4/6, Pack_fp4/6) accepts the kwarg but ignores it. So FP8, MXFP4/6, FP4/6, INT8/UINT8, INT2/INT3 weights produced by file2file are bit-for-bit unchanged.
  • The file2file path was the lone outlier. Every other Quark export entry point already defaults to reorder=True: export/api.py::export_safetensors, ExporterConfig.pack_method, native_model_info_builder.py, qparamslinear.py, ModelQuantizer.export_*, and experimental/cli/torch_llm_ptq.py.
  • The exported config already promises pack_method="reorder" (file2file_quantization.py:863). This PR makes the bytes match the metadata; the reverse fix would diverge from every other Quark export and break the documented pack_method schema in export/api.py:138-140.
  • Existing unit tests already cover both layouts. test/test_for_torch/test_pack_unpack_tensor.py::test_pack_unpack round-trips both reorder=True and reorder=False for int4/uint4/int8/uint8. CI passes unchanged.

Empirical evidence

Kimi-K2.5 quantized via quantize_quark.py --quant_scheme w_int4_per_group_sym --file2file_quantization, evaluated on gsm8k (100 samples) under vLLM's compressed_tensors_w4a16:

Bytes from… gsm8k
current release/0.11 0 / 100 (garbage)
same bytes manually repacked with reorder=True (i.e. what this PR produces) 94 / 100

The weights themselves were correct — only the on-disk nibble order was wrong.

Migration impact

This is a bug fix that ships a byte-layout change. Pre-existing int4/uint4 checkpoints from direct_quantize_checkpoint were already being mis-loaded by pack_method-aware loaders; after this PR they continue to load without error but still produce garbage until regenerated.

Notes for reviewers

  • I targeted release/0.11 because that's where I observed the bug, but release/0.10 looks identical and the same fix should apply.

…rder"

`direct_quantize_checkpoint` was calling `pack_method.pack(weight, False)`
(linear nibble order [0,1,2,3,4,5,6,7]) while the exported config.json
unconditionally advertises `pack_method="reorder"` (AWQ-interleaved
[0,2,4,6,1,3,5,7]). Downstream loaders that respect pack_method (vLLM's
compressed_tensors_w4a16, Quark's own loaders) therefore unpacked these
weights with the AWQ-interleaved unpacker and scrambled every nibble,
producing models that load cleanly and emit garbage.

This change flips that one call to `reorder=True`, aligning file2file
with every other Quark export path (`export/api.py`,
`qparamslinear.py`, `ModelQuantizer.export_*`,
`experimental/cli/torch_llm_ptq.py`) and making the bytes match the
config.

Only `Pack_4_bits` honors the `reorder` flag, so this is a no-op for
FP8, MXFP4/6, FP4/6, INT8/UINT8, INT2/INT3 weights produced by the
file2file path. Existing `test_pack_unpack_tensor.py::test_pack_unpack`
already covers both layouts and continues to pass; a new
`test_file2file_pack_layout.py` pins the end-to-end contract.

Migration: pre-existing int4/uint4 checkpoints produced by this path
were already wrong against pack_method-aware loaders and must be
regenerated. See release notes.
@jin-amd jin-amd marked this pull request as draft April 29, 2026 11:01
@jin-amd jin-amd marked this pull request as ready for review April 29, 2026 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant