[examples] Refactor WrapQ examples into config-driven reusable recipe structure

@Samsung/tico_developers 

## Summary

The current WrapQ example directory has grown into many scripts split by model, module, algorithm, and debugging purpose. As the number of examples has increased, we now have substantial duplicated logic across scripts, and the examples are becoming less useful as user-facing references.

This issue proposes refactoring the WrapQ examples into a smaller, config-driven structure:

```text
tico/quantization/examples/
├── README.md
├── quantize.py
├── evaluate.py
├── inspect.py
└── configs/
    ├── llama_gptq_ptq.yaml
    ├── llama_ptq_only.yaml
    ├── qwen3_vl_gptq_ptq.yaml
    └── qwen3_vl_ptq_only.yaml
```

The actual reusable logic should be moved out of `examples/` and into a new recipe layer:

```text
tico/quantization/recipes/
├── __init__.py
├── config.py
├── context.py
├── runner.py
├── utils.py
├── qparams.py
├── adapters/
├── stages/
├── data/
├── evaluation/
├── export/
└── debug/
```

The goal is to make `examples/` contain only thin CLI entrypoints, while model-specific behavior, algorithm stages, calibration, evaluation, export, and debugging logic live in reusable modules.

## Motivation

The current example structure has several issues:

### 1. Too many example scripts

Scripts are currently split by model type, algorithm, module, and debug purpose. This makes it hard for users to know which script they should run.

### 2. Duplicated logic

Model loading, calibration data preparation, GPTQ/PTQ flow, qparam injection, evaluation, and export logic are repeated across multiple files.

### 3. Examples are doing too much

Some example scripts are effectively acting as libraries, evaluation tools, export tools, and debugging tools at the same time.

### 4. Examples import from other examples

Some scripts depend on helper functions defined in other example scripts. This makes the structure fragile and difficult to refactor.

### 5. Module-level examples are closer to integration tests

Scripts such as individual Qwen vision MLP/attention or LLaMA decoder-layer examples are useful, but they are better suited as tests or debug recipes rather than public user-facing examples.

## Proposed Design

## 1. Keep only three user-facing example CLIs

```text
quantize.py
evaluate.py
inspect.py
```

Their responsibilities should be:

```text
quantize.py : load model → prepare calibration data → run quantization pipeline → optionally evaluate/export
evaluate.py : evaluate FP model or fake-quant/checkpoint model
inspect.py  : trace, parity check, layer/runtime debugging
```

Example usage:

```bash
python -m tico.quantization.examples.quantize \
  --config tico/quantization/examples/configs/llama_gptq_ptq.yaml \
  --model Maykeye/TinyLLama-v0
```

```bash
python -m tico.quantization.examples.quantize \
  --config tico/quantization/examples/configs/qwen3_vl_gptq_ptq.yaml \
  --model Qwen/Qwen3-VL-2B-Instruct
```

```bash
python -m tico.quantization.examples.evaluate \
  --config tico/quantization/examples/configs/llama_gptq_ptq.yaml \
  --checkpoint ./out/llama/quantized_model.pt
```

```bash
python -m tico.quantization.examples.inspect \
  --config tico/quantization/examples/configs/qwen3_vl_ptq_only.yaml \
  --mode trace \
  --interesting-modules model.language_model model.visual
```

---

## 2. Move reusable logic into `tico.quantization.recipes`

Proposed structure:

```text
tico/quantization/recipes/
├── __init__.py
├── config.py
├── context.py
├── runner.py
├── utils.py
├── qparams.py
├── adapters/
│   ├── base.py
│   ├── llama.py
│   └── qwen3_vl.py
├── stages/
│   ├── base.py
│   ├── gptq.py
│   ├── ptq.py
│   ├── spinquant.py
│   ├── cle.py
│   └── smoothquant.py
├── data/
│   ├── llm.py
│   └── vlm.py
├── evaluation/
│   ├── llm.py
│   ├── vlm.py
│   └── mmlu.py
├── export/
│   ├── checkpoint.py
│   └── circle.py
└── debug/
    └── trace.py
```

The `examples/` scripts should only parse CLI arguments, load config, and call the recipe runner or adapter APIs.

---

## 3. Represent algorithm combinations through config, not script names

Instead of creating separate scripts such as:

```text
quantize_full_qmodel_with_gptq.py
quantize_qwen3_vl_with_gptq.py
quantize_model.py
quantize_for_conditional_generation.py
```

we should use YAML configs:

```yaml
pipeline:
  - name: gptq
    enabled: true
    weight_bits: 4
    perchannel: true

  - name: ptq
    enabled: true
    activation_dtype: int16
    linear_weight_bits: 4
```

For PTQ-only:

```yaml
pipeline:
  - name: ptq
    enabled: true
    activation_dtype: int16
    linear_weight_bits: 8
```

This makes the example layer stable even when we add more algorithms or model families.

---

## 4. Use model adapters for model-family-specific behavior

Model-specific code should live in adapters:

```text
recipes/adapters/
├── base.py
├── llama.py
└── qwen3_vl.py
```

Each adapter should handle:

```text
- model/tokenizer/processor loading
- calibration input construction
- PTQ config construction
- calibration forward pass
- evaluation
- export behavior
```

This allows the common runner to stay generic:

```python
adapter = get_adapter(cfg["model"]["family"])
ctx = adapter.load_model(ctx)
ctx.calibration_inputs = adapter.build_calibration_inputs(ctx)

for stage_cfg in cfg["pipeline"]:
    stage = get_stage(stage_cfg["name"])
    ctx = stage.run(ctx, stage_cfg)

adapter.evaluate(ctx)
adapter.export(ctx)
```

---

## Proposed Migration Mapping

| Current script | Proposed handling |
|---|---|
| `quantize_full_qmodel_with_gptq.py` | Move logic into `recipes/adapters/llama.py`, `recipes/stages/*`, `recipes/evaluation/llm.py`, and `recipes/export/*`. Replace usage with `examples/quantize.py --config configs/llama_gptq_ptq.yaml`. |
| `evaluate_fk_llama_model.py` | Merge into `examples/evaluate.py` and `recipes/evaluation/llm.py`. |
| `static_llama_layer_runtime.py` | Move to `recipes/debug/` and expose through `examples/inspect.py`. |
| `llama/quantize_decoder_layer_prefill.py` | Move to integration tests or expose as an inspect/debug recipe. |
| `quantize_qwen3_vl_with_gptq.py` | Move logic into `recipes/adapters/qwen3_vl.py`, `recipes/stages/*`, `recipes/evaluation/vlm.py`, and `recipes/evaluation/mmlu.py`. |
| `quantize_full_vlm_model_with_gptq.py` | Merge into the common Qwen3-VL recipe path. |
| `qwen/quantize_model.py` | Move to synthetic smoke test or PTQ-only config. |
| `qwen/quantize_for_conditional_generation.py` | Merge into Qwen3-VL adapter/config path. |
| `qwen/quantize_vision_mlp.py` | Move to wrapper-level integration test. |
| `qwen/quantize_vision_attention.py` | Move to wrapper-level integration test. |
| `qwen/trace_qwen.py` | Move to `recipes/debug/trace.py` and expose through `examples/inspect.py --mode trace`. |

---

## Integration Plan

I'll merge related codes into `tico/quantization/examples`. And, before it becomes stable, I'll keep existing `tico/quantization/wrapq/examples`. After some time, existing folder will be renamed to `examples_deprecated` and removed later.

## Expected Outcome

After this refactor, adding a new model family or algorithm variant should not require adding another example script. Instead, we should add or update:

```text
- one adapter, if it is a new model family
- one stage, if it is a new algorithm
- one YAML config, if it is a new recipe
```

The public example surface will remain small and stable, while the actual quantization workflow becomes easier to reuse, test, and maintain.

Current script	Proposed handling
`quantize_full_qmodel_with_gptq.py`	Move logic into `recipes/adapters/llama.py`, `recipes/stages/`, `recipes/evaluation/llm.py`, and `recipes/export/`. Replace usage with `examples/quantize.py --config configs/llama_gptq_ptq.yaml`.
`evaluate_fk_llama_model.py`	Merge into `examples/evaluate.py` and `recipes/evaluation/llm.py`.
`static_llama_layer_runtime.py`	Move to `recipes/debug/` and expose through `examples/inspect.py`.
`llama/quantize_decoder_layer_prefill.py`	Move to integration tests or expose as an inspect/debug recipe.
`quantize_qwen3_vl_with_gptq.py`	Move logic into `recipes/adapters/qwen3_vl.py`, `recipes/stages/*`, `recipes/evaluation/vlm.py`, and `recipes/evaluation/mmlu.py`.
`quantize_full_vlm_model_with_gptq.py`	Merge into the common Qwen3-VL recipe path.
`qwen/quantize_model.py`	Move to synthetic smoke test or PTQ-only config.
`qwen/quantize_for_conditional_generation.py`	Merge into Qwen3-VL adapter/config path.
`qwen/quantize_vision_mlp.py`	Move to wrapper-level integration test.
`qwen/quantize_vision_attention.py`	Move to wrapper-level integration test.
`qwen/trace_qwen.py`	Move to `recipes/debug/trace.py` and expose through `examples/inspect.py --mode trace`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[examples] Refactor WrapQ examples into config-driven reusable recipe structure #716

Summary

Motivation

1. Too many example scripts

2. Duplicated logic

3. Examples are doing too much

4. Examples import from other examples

5. Module-level examples are closer to integration tests

Proposed Design

1. Keep only three user-facing example CLIs

2. Move reusable logic into `tico.quantization.recipes`

3. Represent algorithm combinations through config, not script names

4. Use model adapters for model-family-specific behavior

Proposed Migration Mapping

Integration Plan

Expected Outcome

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[examples] Refactor WrapQ examples into config-driven reusable recipe structure #716

Description

Summary

Motivation

1. Too many example scripts

2. Duplicated logic

3. Examples are doing too much

4. Examples import from other examples

5. Module-level examples are closer to integration tests

Proposed Design

1. Keep only three user-facing example CLIs

2. Move reusable logic into tico.quantization.recipes

3. Represent algorithm combinations through config, not script names

4. Use model adapters for model-family-specific behavior

Proposed Migration Mapping

Integration Plan

Expected Outcome

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

2. Move reusable logic into `tico.quantization.recipes`