-
Notifications
You must be signed in to change notification settings - Fork 245
Description
I do get initially errors.
-
Hybrid architecture compatibility with mcore_adapter: Qwen3.5's hybrid architecture (full attention every 4 layers + GDN linear attention + Mamba SSM) is non-standard. The apply_megatron_lora() function in mcore_adapter was designed for standard transformer models (Qwen2.5). The GDN and Mamba layers may not be properly recognized or adapted.
-
all-linear expansion: When lora_target: all-linear is used, find_all_linear_modules() auto-discovers linear layers. For Qwen3.5's hybrid layers (GDN projections like in_proj_qkv, in_proj_z, in_proj_b, in_proj_a), it's unclear if these get correctly identified and wrapped with LoRA adapters by the Megatron backend.
-
VLM wrapper: Qwen3.5 loads as Qwen3_5ForConditionalGeneration (VLM) — LoRA needs to target only the text model, not the vision encoder. We use freeze_module_prefix: vision_model but need to verify this interacts correctly with the LoRA setup path