Does LORA works with Qwen3.5, when using the Megatron backend?

I do get  initially errors. 

                                                                                                                                                                                                                                                                                                                                                                       

- Hybrid architecture compatibility with mcore_adapter: Qwen3.5's hybrid architecture (full attention every 4 layers + GDN linear attention + Mamba SSM) is non-standard. The apply_megatron_lora() function in mcore_adapter was designed for standard transformer models (Qwen2.5). The GDN and Mamba layers may not be properly recognized or adapted.          

- all-linear expansion: When lora_target: all-linear is used, find_all_linear_modules() auto-discovers linear layers. For Qwen3.5's hybrid layers (GDN projections like in_proj_qkv, in_proj_z, in_proj_b, in_proj_a), it's unclear if these get correctly identified and wrapped with LoRA adapters by the Megatron backend.

-  VLM wrapper: Qwen3.5 loads as Qwen3_5ForConditionalGeneration (VLM) — LoRA needs to target only the text model, not the vision encoder. We use freeze_module_prefix: vision_model but need to verify this interacts correctly with the LoRA setup path

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does LORA works with Qwen3.5, when using the Megatron backend? #372

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Does LORA works with Qwen3.5, when using the Megatron backend? #372

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions