@Samsung/tico_developers
Summary
The current WrapQ example directory has grown into many scripts split by model, module, algorithm, and debugging purpose. As the number of examples has increased, we now have substantial duplicated logic across scripts, and the examples are becoming less useful as user-facing references.
This issue proposes refactoring the WrapQ examples into a smaller, config-driven structure:
tico/quantization/examples/
├── README.md
├── quantize.py
├── evaluate.py
├── inspect.py
└── configs/
├── llama_gptq_ptq.yaml
├── llama_ptq_only.yaml
├── qwen3_vl_gptq_ptq.yaml
└── qwen3_vl_ptq_only.yaml
The actual reusable logic should be moved out of examples/ and into a new recipe layer:
tico/quantization/recipes/
├── __init__.py
├── config.py
├── context.py
├── runner.py
├── utils.py
├── qparams.py
├── adapters/
├── stages/
├── data/
├── evaluation/
├── export/
└── debug/
The goal is to make examples/ contain only thin CLI entrypoints, while model-specific behavior, algorithm stages, calibration, evaluation, export, and debugging logic live in reusable modules.
Motivation
The current example structure has several issues:
1. Too many example scripts
Scripts are currently split by model type, algorithm, module, and debug purpose. This makes it hard for users to know which script they should run.
2. Duplicated logic
Model loading, calibration data preparation, GPTQ/PTQ flow, qparam injection, evaluation, and export logic are repeated across multiple files.
3. Examples are doing too much
Some example scripts are effectively acting as libraries, evaluation tools, export tools, and debugging tools at the same time.
4. Examples import from other examples
Some scripts depend on helper functions defined in other example scripts. This makes the structure fragile and difficult to refactor.
5. Module-level examples are closer to integration tests
Scripts such as individual Qwen vision MLP/attention or LLaMA decoder-layer examples are useful, but they are better suited as tests or debug recipes rather than public user-facing examples.
Proposed Design
1. Keep only three user-facing example CLIs
quantize.py
evaluate.py
inspect.py
Their responsibilities should be:
quantize.py : load model → prepare calibration data → run quantization pipeline → optionally evaluate/export
evaluate.py : evaluate FP model or fake-quant/checkpoint model
inspect.py : trace, parity check, layer/runtime debugging
Example usage:
python -m tico.quantization.examples.quantize \
--config tico/quantization/examples/configs/llama_gptq_ptq.yaml \
--model Maykeye/TinyLLama-v0
python -m tico.quantization.examples.quantize \
--config tico/quantization/examples/configs/qwen3_vl_gptq_ptq.yaml \
--model Qwen/Qwen3-VL-2B-Instruct
python -m tico.quantization.examples.evaluate \
--config tico/quantization/examples/configs/llama_gptq_ptq.yaml \
--checkpoint ./out/llama/quantized_model.pt
python -m tico.quantization.examples.inspect \
--config tico/quantization/examples/configs/qwen3_vl_ptq_only.yaml \
--mode trace \
--interesting-modules model.language_model model.visual
2. Move reusable logic into tico.quantization.recipes
Proposed structure:
tico/quantization/recipes/
├── __init__.py
├── config.py
├── context.py
├── runner.py
├── utils.py
├── qparams.py
├── adapters/
│ ├── base.py
│ ├── llama.py
│ └── qwen3_vl.py
├── stages/
│ ├── base.py
│ ├── gptq.py
│ ├── ptq.py
│ ├── spinquant.py
│ ├── cle.py
│ └── smoothquant.py
├── data/
│ ├── llm.py
│ └── vlm.py
├── evaluation/
│ ├── llm.py
│ ├── vlm.py
│ └── mmlu.py
├── export/
│ ├── checkpoint.py
│ └── circle.py
└── debug/
└── trace.py
The examples/ scripts should only parse CLI arguments, load config, and call the recipe runner or adapter APIs.
3. Represent algorithm combinations through config, not script names
Instead of creating separate scripts such as:
quantize_full_qmodel_with_gptq.py
quantize_qwen3_vl_with_gptq.py
quantize_model.py
quantize_for_conditional_generation.py
we should use YAML configs:
pipeline:
- name: gptq
enabled: true
weight_bits: 4
perchannel: true
- name: ptq
enabled: true
activation_dtype: int16
linear_weight_bits: 4
For PTQ-only:
pipeline:
- name: ptq
enabled: true
activation_dtype: int16
linear_weight_bits: 8
This makes the example layer stable even when we add more algorithms or model families.
4. Use model adapters for model-family-specific behavior
Model-specific code should live in adapters:
recipes/adapters/
├── base.py
├── llama.py
└── qwen3_vl.py
Each adapter should handle:
- model/tokenizer/processor loading
- calibration input construction
- PTQ config construction
- calibration forward pass
- evaluation
- export behavior
This allows the common runner to stay generic:
adapter = get_adapter(cfg["model"]["family"])
ctx = adapter.load_model(ctx)
ctx.calibration_inputs = adapter.build_calibration_inputs(ctx)
for stage_cfg in cfg["pipeline"]:
stage = get_stage(stage_cfg["name"])
ctx = stage.run(ctx, stage_cfg)
adapter.evaluate(ctx)
adapter.export(ctx)
Proposed Migration Mapping
| Current script |
Proposed handling |
quantize_full_qmodel_with_gptq.py |
Move logic into recipes/adapters/llama.py, recipes/stages/*, recipes/evaluation/llm.py, and recipes/export/*. Replace usage with examples/quantize.py --config configs/llama_gptq_ptq.yaml. |
evaluate_fk_llama_model.py |
Merge into examples/evaluate.py and recipes/evaluation/llm.py. |
static_llama_layer_runtime.py |
Move to recipes/debug/ and expose through examples/inspect.py. |
llama/quantize_decoder_layer_prefill.py |
Move to integration tests or expose as an inspect/debug recipe. |
quantize_qwen3_vl_with_gptq.py |
Move logic into recipes/adapters/qwen3_vl.py, recipes/stages/*, recipes/evaluation/vlm.py, and recipes/evaluation/mmlu.py. |
quantize_full_vlm_model_with_gptq.py |
Merge into the common Qwen3-VL recipe path. |
qwen/quantize_model.py |
Move to synthetic smoke test or PTQ-only config. |
qwen/quantize_for_conditional_generation.py |
Merge into Qwen3-VL adapter/config path. |
qwen/quantize_vision_mlp.py |
Move to wrapper-level integration test. |
qwen/quantize_vision_attention.py |
Move to wrapper-level integration test. |
qwen/trace_qwen.py |
Move to recipes/debug/trace.py and expose through examples/inspect.py --mode trace. |
Integration Plan
I'll merge related codes into tico/quantization/examples. And, before it becomes stable, I'll keep existing tico/quantization/wrapq/examples. After some time, existing folder will be renamed to examples_deprecated and removed later.
Expected Outcome
After this refactor, adding a new model family or algorithm variant should not require adding another example script. Instead, we should add or update:
- one adapter, if it is a new model family
- one stage, if it is a new algorithm
- one YAML config, if it is a new recipe
The public example surface will remain small and stable, while the actual quantization workflow becomes easier to reuse, test, and maintain.
@Samsung/tico_developers
Summary
The current WrapQ example directory has grown into many scripts split by model, module, algorithm, and debugging purpose. As the number of examples has increased, we now have substantial duplicated logic across scripts, and the examples are becoming less useful as user-facing references.
This issue proposes refactoring the WrapQ examples into a smaller, config-driven structure:
The actual reusable logic should be moved out of
examples/and into a new recipe layer:The goal is to make
examples/contain only thin CLI entrypoints, while model-specific behavior, algorithm stages, calibration, evaluation, export, and debugging logic live in reusable modules.Motivation
The current example structure has several issues:
1. Too many example scripts
Scripts are currently split by model type, algorithm, module, and debug purpose. This makes it hard for users to know which script they should run.
2. Duplicated logic
Model loading, calibration data preparation, GPTQ/PTQ flow, qparam injection, evaluation, and export logic are repeated across multiple files.
3. Examples are doing too much
Some example scripts are effectively acting as libraries, evaluation tools, export tools, and debugging tools at the same time.
4. Examples import from other examples
Some scripts depend on helper functions defined in other example scripts. This makes the structure fragile and difficult to refactor.
5. Module-level examples are closer to integration tests
Scripts such as individual Qwen vision MLP/attention or LLaMA decoder-layer examples are useful, but they are better suited as tests or debug recipes rather than public user-facing examples.
Proposed Design
1. Keep only three user-facing example CLIs
Their responsibilities should be:
Example usage:
2. Move reusable logic into
tico.quantization.recipesProposed structure:
The
examples/scripts should only parse CLI arguments, load config, and call the recipe runner or adapter APIs.3. Represent algorithm combinations through config, not script names
Instead of creating separate scripts such as:
we should use YAML configs:
For PTQ-only:
This makes the example layer stable even when we add more algorithms or model families.
4. Use model adapters for model-family-specific behavior
Model-specific code should live in adapters:
Each adapter should handle:
This allows the common runner to stay generic:
Proposed Migration Mapping
quantize_full_qmodel_with_gptq.pyrecipes/adapters/llama.py,recipes/stages/*,recipes/evaluation/llm.py, andrecipes/export/*. Replace usage withexamples/quantize.py --config configs/llama_gptq_ptq.yaml.evaluate_fk_llama_model.pyexamples/evaluate.pyandrecipes/evaluation/llm.py.static_llama_layer_runtime.pyrecipes/debug/and expose throughexamples/inspect.py.llama/quantize_decoder_layer_prefill.pyquantize_qwen3_vl_with_gptq.pyrecipes/adapters/qwen3_vl.py,recipes/stages/*,recipes/evaluation/vlm.py, andrecipes/evaluation/mmlu.py.quantize_full_vlm_model_with_gptq.pyqwen/quantize_model.pyqwen/quantize_for_conditional_generation.pyqwen/quantize_vision_mlp.pyqwen/quantize_vision_attention.pyqwen/trace_qwen.pyrecipes/debug/trace.pyand expose throughexamples/inspect.py --mode trace.Integration Plan
I'll merge related codes into
tico/quantization/examples. And, before it becomes stable, I'll keep existingtico/quantization/wrapq/examples. After some time, existing folder will be renamed toexamples_deprecatedand removed later.Expected Outcome
After this refactor, adding a new model family or algorithm variant should not require adding another example script. Instead, we should add or update:
The public example surface will remain small and stable, while the actual quantization workflow becomes easier to reuse, test, and maintain.