Fix file2file int4/uint4 packing to match advertised pack_method="reorder"#30
Open
jin-amd wants to merge 1 commit into
Open
Fix file2file int4/uint4 packing to match advertised pack_method="reorder"#30jin-amd wants to merge 1 commit into
jin-amd wants to merge 1 commit into
Conversation
…rder" `direct_quantize_checkpoint` was calling `pack_method.pack(weight, False)` (linear nibble order [0,1,2,3,4,5,6,7]) while the exported config.json unconditionally advertises `pack_method="reorder"` (AWQ-interleaved [0,2,4,6,1,3,5,7]). Downstream loaders that respect pack_method (vLLM's compressed_tensors_w4a16, Quark's own loaders) therefore unpacked these weights with the AWQ-interleaved unpacker and scrambled every nibble, producing models that load cleanly and emit garbage. This change flips that one call to `reorder=True`, aligning file2file with every other Quark export path (`export/api.py`, `qparamslinear.py`, `ModelQuantizer.export_*`, `experimental/cli/torch_llm_ptq.py`) and making the bytes match the config. Only `Pack_4_bits` honors the `reorder` flag, so this is a no-op for FP8, MXFP4/6, FP4/6, INT8/UINT8, INT2/INT3 weights produced by the file2file path. Existing `test_pack_unpack_tensor.py::test_pack_unpack` already covers both layouts and continues to pass; a new `test_file2file_pack_layout.py` pins the end-to-end contract. Migration: pre-existing int4/uint4 checkpoints produced by this path were already wrong against pack_method-aware loaders and must be regenerated. See release notes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_quantize_and_save_safetensor_shard(the worker behindModelQuantizer.direct_quantize_checkpoint) packsint4/uint4weights with the linear nibble layout, while the same function unconditionally writespack_method="reorder"(AWQ-interleaved) into the exportedconfig.json. Loaders that respect that field — vLLM'scompressed_tensors_w4a16and Quark's ownqparamslinear— call the AWQ-interleaved unpacker on linearly-packed bytes and silently scramble every nibble. Models load cleanly and emit garbage.This PR flips the one offending positional arg to
reorder=True, making the bytes match the metadata.Why this is safe
Pack_4_bitsactually honorsreorder. Every otherPackclass inquark/torch/utils/pack.py(Pack_2/3/8_bits,Pack_mxfp4/6,Pack_fp4/6) accepts the kwarg but ignores it. So FP8, MXFP4/6, FP4/6, INT8/UINT8, INT2/INT3 weights produced by file2file are bit-for-bit unchanged.reorder=True:export/api.py::export_safetensors,ExporterConfig.pack_method,native_model_info_builder.py,qparamslinear.py,ModelQuantizer.export_*, andexperimental/cli/torch_llm_ptq.py.pack_method="reorder"(file2file_quantization.py:863). This PR makes the bytes match the metadata; the reverse fix would diverge from every other Quark export and break the documentedpack_methodschema inexport/api.py:138-140.test/test_for_torch/test_pack_unpack_tensor.py::test_pack_unpackround-trips bothreorder=Trueandreorder=Falseforint4/uint4/int8/uint8. CI passes unchanged.Empirical evidence
Kimi-K2.5 quantized via
quantize_quark.py --quant_scheme w_int4_per_group_sym --file2file_quantization, evaluated on gsm8k (100 samples) under vLLM'scompressed_tensors_w4a16:release/0.11reorder=True(i.e. what this PR produces)The weights themselves were correct — only the on-disk nibble order was wrong.
Migration impact
This is a bug fix that ships a byte-layout change. Pre-existing
int4/uint4checkpoints fromdirect_quantize_checkpointwere already being mis-loaded bypack_method-aware loaders; after this PR they continue to load without error but still produce garbage until regenerated.Notes for reviewers
release/0.11because that's where I observed the bug, butrelease/0.10looks identical and the same fix should apply.