GitHub - ethanfel/ComfyUI-LoRA-Optimizer

Stacking multiple LoRAs in ComfyUI often causes oversaturation, artifacts, or lost details. This node suite automatically figures out the best way to combine your LoRAs — analyzing where they conflict, resolving those conflicts per model layer, and adjusting strengths so the result looks clean. Just add your LoRAs to a stack, connect the optimizer, and generate.

The Problem

Should I Merge This LoRA?

Merge Strategy Guide

How LoRAs relate to each other, what the optimizer does about it, and when to change settings.

Quick Start

Add a LoRA Stack (Dynamic) node — pick your LoRAs, set strengths
Add a LoRA Optimizer node — connect MODEL (and optionally CLIP) from your checkpoint
Connect the optimizer's MODEL/CLIP outputs to your sampler — done

LoRA Stack (Dynamic) ──► LoRA Optimizer ──► KSampler
                              ▲
Load Checkpoint ──► MODEL ────┘

Everything is automatic. Connect analysis_report to a Show Text node to see what the optimizer did.

Want more control? Add a Settings node → connect to the optimizer's settings input. Want the best config found for you? Use the LoRA AutoTuner instead of the optimizer.

Installation

ComfyUI Manager

Search for "LoRA Optimizer" in ComfyUI Manager and install.

Manual install

cd ComfyUI/custom_nodes/
git clone https://github.com/ethanfel/ComfyUI-LoRA-Optimizer.git

Restart ComfyUI. Nodes appear under the loaders category.

Nodes at a Glance

Node	What It Does
LoRA Stack / (Dynamic)	Build your list of LoRAs — pick files, set strengths
LoRA Optimizer	Analyze + merge your stack automatically. Just connect and go
Settings Nodes	Optional fine-tuning: sparsification, compression, smoothing, etc.
LoRA AutoTuner	Sweep 2000+ parameter combos, rank the best configs
Merge Selector	Try alternative ranked configs from AutoTuner results
Compatibility Analyzer	Check which LoRAs work well together before merging
Save Merged LoRA	Export the merge as a standalone `.safetensors` file
Merged LoRA to Hook	Apply merged LoRA per-conditioning instead of globally
LoRA Optimizer (Legacy)	All parameters on one node (superseded by Optimizer + Settings)

Also accepts standard tuple-format stacks (lora_name, model_strength, clip_strength) from Efficiency Nodes, Comfyroll, and similar packs.

Full parameter reference: Nodes wiki page · Configuration Guide · Workflows

Examples

Z-Image (Lumina2) — 3 LoRAs merged

How It Works

Per-Group Adaptive Merge — Deep Dive

The key insight: two LoRAs may overlap in some model blocks but not others. A face LoRA and a style LoRA might only conflict in attention layers 4-7, while the rest of the model is touched by only one of them.

Instead of picking one global strategy (which either wastes TIES trimming on non-overlapping blocks or misses real conflicts), the optimizer decides per resolved target group:

Condition	Strategy
Only 1 LoRA touches this group	`weighted_sum` — full strength, no dilution
2+ LoRAs, low excess conflict + low subspace overlap	`weighted_average` — mostly independent updates
2+ LoRAs, high similarity + low excess conflict	`consensus` — aligned, low-interference merge
2+ LoRAs, excess conflict > 25% with overlapping subspaces	`ties` — resolve real conflicts with trim/elect/merge
Magnitude ratio > 2x in the group	`total` sign method (stronger LoRA dominates)
Magnitude ratio <= 2x in the group	`frequency` sign method (equal votes)

This means non-overlapping regions keep 100% of their LoRA's effect, while genuinely conflicting regions get proper TIES resolution. When decision_smoothing > 0, those per-group metrics are softly pulled toward the block average so adjacent layers do not flip strategies due to noisy samples.

Two-Pass Streaming Architecture

The optimizer uses a two-pass streaming architecture for low memory usage:

Pass 1 (Analysis): Resolves trainer aliases to target weights, aggregates alias collisions per LoRA, samples conflict and magnitude statistics per target group, then discards the diffs. Only lightweight scalars are kept.
Pass 2 (Merge): Recomputes diffs per target group, looks up that group's conflict data, picks a strategy for it, and merges. Each group is freed after merging. Standard linear merges stay in exact low-rank form; nonlinear merges and optional compression still use dense/SVD paths.

Peak memory is still roughly "one target group at a time," but the exact peak depends on the largest layer, how many LoRAs hit it, and whether extra quality/compression steps are enabled. GPU-accelerated on both passes.

What It Analyzes

Per-LoRA metrics (rank, key count, effective L2 norms)
Pairwise raw + magnitude-weighted conflict ratios per target group (sampled for efficiency)
Excess conflict over the cosine baseline, plus low-rank subspace overlap
Pairwise cosine similarity (directional alignment between LoRAs)
Magnitude / activation-importance distribution per target group
Key overlap between LoRAs

Optimizer Features

TIES Merging

The optimizer automatically selects TIES-Merging (Trim, Elect Sign, Disjoint Merge — Yadav et al., NeurIPS 2023) on prefixes where sign conflicts are detected between LoRAs.

DARE / DELLA Sparsification

DARE and DELLA sparsify each LoRA's diff before merging, reducing parameter interference between LoRAs. The implementations here are practical LoRA-oriented variants inspired by those papers, not paper-faithful reproductions. Available in two modes: standard (drops weights everywhere) and conflict-aware (only drops weights where LoRAs actually interfere).

Method	How It Works
DARE	Bernoulli random mask at given density. Survivors rescaled by 1/density to preserve expected value. Fast and unbiased.
DELLA	Per-row magnitude ranking. Low-magnitude elements get higher drop probability, high-magnitude elements are kept. More surgical than DARE.
DARE (conflict-aware)	Same as DARE, but only applied at positions where 2+ LoRAs push in opposite directions. Same-sign positions (where LoRAs reinforce each other) are left untouched.
DELLA (conflict-aware)	Same as DELLA, but only at conflict positions. Unique contributions from each LoRA are fully preserved.

Why conflict-aware? Standard sparsification drops weights everywhere — including positions where only one LoRA contributes, or where multiple LoRAs agree. This destroys useful signal. Conflict-aware variants compute a sign-conflict mask first: positions where LoRAs push in opposite directions (actual interference). Only those positions get sparsified. The result: interference is reduced without sacrificing unique features.

Interaction with merge strategies:

TIES mode: DARE/DELLA replaces the TIES trim step (both achieve sparsification, no need for both)
Other modes: Applied as preprocessing before the merge operation

Setting	Default	Options
`sparsification`	disabled	`disabled`, `dare`, `della`, `dare_conflict`, `della_conflict`
`sparsification_density`	0.7	Fraction of parameters to keep (lower = more aggressive)

Merge Refinement (Refine / Full)

Optional preprocessing steps applied to weight diffs before merging, selectable via the merge_refinement dropdown:

Level	What It Adds	Cost
none (default)	Merge as-is, no extra processing	Baseline
refine	Direction orthogonalization + TALL-mask selfish weight protection	Minimal extra compute, no extra VRAM
full	KnOTS SVD alignment + orthogonalization + TALL-masks	More VRAM for SVD decomposition

TALL-masks (refine+): Identifies "selfish" weights — positions where one LoRA dominates and others contribute little. These weights are separated from the consensus merge and added back afterward, protecting each LoRA's unique features from being averaged away.

Direction orthogonalization (refine+): Projects LoRA diffs to be mutually orthogonal, reducing interference between LoRAs that modify overlapping weight regions.

KnOTS SVD alignment (full): Projects all LoRA diffs into a shared singular value basis via truncated SVD before merging. This makes diffs more directly comparable by aligning their representation spaces. Falls back to CPU on GPU OOM, skips gracefully if both fail.

Interaction with other settings:

Works with all merge modes (TIES, weighted_average, SLERP, etc.)
Combines with DARE/DELLA sparsification — sparsification runs first, then refinement
Best combination: maximum + della_conflict (or dare_conflict) for full pipeline
Single-LoRA prefixes: all enhancements short-circuit (no work to do)

Setting	Default	Options
`merge_refinement`	none	`none`, `refine`, `full`

Key Filter

Each LoRA has a per-LoRA key_filter setting (available on both LoRA Stack and LoRA Stack (Dynamic) in advanced mode) that controls which target groups that LoRA contributes to, based on how many LoRAs in the stack share each resolved target:

Filter	Behavior	Use Case
`all` (default)	Contribute to all keys	Normal merging
`shared_only`	Only contribute to keys present in 2+ LoRAs	Strip variant-specific keys (I2V/VACE) from this LoRA
`unique_only`	Only contribute to keys present in exactly 1 LoRA	Extract only the variant-specific adapter keys from this LoRA

This is especially useful for Wan T2V/I2V/VACE LoRAs, which share ~90% of weights but each variant has unique keys (I2V: cross_attn.k_img/v_img, img_emb; VACE: vace_blocks.*, vace_patch_embedding).

Because the filter is per-LoRA, you can apply different filters to different LoRAs in the same stack — e.g., "take only the unique VACE keys from LoRA #2 while merging all keys from LoRA #1".

Example — making an I2V LoRA T2V-compatible:

Stack a T2V LoRA + an I2V LoRA together
Set the I2V LoRA's key_filter to shared_only
The I2V-only keys (k_img, v_img, img_emb, etc.) are skipped for that LoRA since they appear in only 1 LoRA
The merged result contains only the shared T2V-compatible weights

Example — extracting a lightweight I2V adapter:

Same stack (T2V + I2V)
Set the I2V LoRA's key_filter to unique_only
Only the I2V-specific keys are contributed by that LoRA — a small adapter with just the variant-specific weights

The filter uses the raw n_loras count from Pass 1 (before any filtering) and now participates in analysis as well as Pass 2 merge.

Auto-Strength

When auto_strength is set to enabled, the optimizer automatically reduces per-LoRA strengths before merging to prevent overexposure from stacking. This is especially useful on distilled/turbo models where 2+ LoRAs at full strength cause blown-out results even with strong merge settings.

The algorithm uses interference-aware energy normalization: during Pass 1 it streams exact Frobenius norms and pairwise dots for each LoRA branch, then computes the exact vector-sum energy separately for model and CLIP updates. All strengths are uniformly scaled so the total combined energy matches what the strongest single LoRA would contribute alone.

Aligned LoRAs (cos~1) — stronger reduction (they reinforce each other, so combined energy is high)
Orthogonal LoRAs (cos~0) — moderate reduction, optionally clamped by an architecture-aware floor
Opposing LoRAs (cos~-1) — minimal reduction (they cancel out, so combined energy is low)

When orthogonal LoRAs are effectively independent, the optimizer can clamp the scale factor with auto_strength_floor:

Architecture	Default floor
Wan / LTX Video	1.0
SD / SDXL / Flux / Z-Image	0.85
LLM-style presets	0.9

auto_strength_floor = -1 uses the architecture default. Setting 0.0–1.0 overrides it manually.

Scenario	Result
2 aligned LoRAs (cos~1) at strength 1.0	Each reduced to ~0.50
2 orthogonal LoRAs (cos~0) at strength 1.0	Each reduced to ~0.71 before floor-clamping
2 opposing LoRAs (cos~-1) at strength 1.0	~1.0 each (they cancel)
1 strong + 1 weak LoRA	Proportional reduction
Single LoRA	No change
`auto_strength` disabled	No adjustment

Your original strength ratios are always preserved — the algorithm only scales them down uniformly.

Decision Smoothing

decision_smoothing — blends each group's decision metrics toward the average of its surrounding block. This reduces jagged layer-to-layer mode flips when the stack is noisy.

smooth_slerp_gate — when enabled, uses per-prefix cosine similarity (computed during analysis) instead of the collection average for the SLERP interpolation gate. This makes the SLERP weight vary per layer based on local alignment rather than using a single global value. Available on the LoRA Merge Settings node or the Legacy optimizer.

Architecture-Aware Key Normalization

Different LoRA trainers (Kohya, AI-Toolkit, LyCORIS, diffusers/PEFT) produce LoRAs with different key naming conventions for the same model weights. When mixing LoRAs from different trainers, the optimizer sees no key overlap and cannot merge them correctly.

Key normalization auto-detects the model architecture from LoRA key patterns and remaps all keys to a canonical format, enabling correct overlap detection and conflict analysis across trainer formats.

Architecture	Detected From	Normalization
Z-Image (Lumina2)	`diffusion_model.layers.N.attention`, `single_transformer_blocks`	Prefix standardization, QKV split for per-component analysis, re-fuse after merge
FLUX	`double_blocks`/`single_blocks`, `transformer.transformer_blocks`	AI-Toolkit / Kohya / diffusers unified to canonical format
Wan 2.1/2.2	`blocks.N` with `self_attn`/`cross_attn`/`ffn`	LyCORIS / diffusers / Musubi Tuner unified, RS-LoRA alpha fix
SDXL	`lora_te1_`/`lora_te2_`, `input_blocks`/`down_blocks`	Text encoder + UNet key unification
LTX Video	`adaln_single`, `transformer_blocks` with `attn1`/`attn2`	Trainer format unification
ACE-Step	`layers.N` with `self_attn`/`cross_attn` and `q_proj`/`k_proj`/`v_proj`	Attention key unification
Qwen-Image	`transformer_blocks` with `img_mlp`/`txt_mlp`/`img_mod`/`txt_mod`	Dual-stream key unification

Z-Image QKV handling: Z-Image LoRAs often fuse Q, K, V projections into a single attention.qkv weight. The normalizer splits these into separate to_q/to_k/to_v components for per-component conflict analysis, then re-fuses them back to the native format after merging.

Setting	Default	Effect
`normalize_keys`	enabled	`disabled` or `enabled`. Recommended for mixed-trainer stacks and required for Z-Image QKV fusion.

Architecture-Aware Behavior Profiles

All numeric thresholds in the optimizer (density estimation, conflict detection, auto-strength scaling, scoring heuristics) are tuned per architecture family. The architecture_preset setting selects the appropriate thresholds — auto detects from LoRA key patterns.

Preset	Architectures	Key Differences	Orthogonal floor
`sd_unet`	SD 1.5, SDXL	Density range [0.1, 0.9], noise floor 10%, max strength cap 3.0	0.85
`dit`	Flux, WAN, Z-Image, LTX, HunyuanVideo	Density range [0.4, 0.95], noise floor 5%, max strength cap 5.0	0.85 by default, 1.0 for Wan/LTX
`llm`	Qwen-Image, LLaMA-based	Density range [0.1, 0.8], noise floor 15%, max strength cap 3.0	0.9

Why it matters: DiT architectures have denser weight distributions than UNet — with UNet thresholds, the optimizer underestimates density and clips suggested strength too aggressively. LLM-based models are sparser and benefit from lower density ceilings.

Setting	Default	Options
`architecture_preset`	auto	`auto`, `sd_unet`, `dit`, `llm`. Auto-detection uses the same key pattern matching as key normalization

Note: This is orthogonal to strategy_set (which controls which strategies are available — consensus, SLERP, etc.). Architecture preset controls the numeric thresholds those strategies use.

SVD Patch Compression

After merging, full-rank diff patches consume ~128x more RAM than standard LoRA patches (64MB vs 0.5MB per key for a 4096x4096 weight). The optimizer re-compresses merged patches to low-rank via truncated SVD, dramatically reducing post-merge RAM.

Mode	What gets compressed	Quality	RAM savings
`smart` (default)	`weighted_sum` and `weighted_average` prefixes only	Lossless — sum of input ranks preserves all merge information	~32x on compressed prefixes
`aggressive`	Everything including TIES	Lossy on TIES prefixes — nonlinear ops (trim, sign election) produce full-rank results that can't be perfectly captured	~32x on all prefixes
`disabled`	Nothing	No loss	No savings

When dense compression is needed, the compression rank is automatically computed as the sum of all input LoRA ranks. For example, 3 rank-32 LoRAs produce a rank-96 compressed patch — enough to represent the full merge on linear operations when no extra nonlinear processing is involved.

Tip: For video models (LTX, Wan, etc.) with high RAM usage, use additive mode + smart (or aggressive) compression. Every patch gets losslessly compressed with minimal RAM footprint.

Optimization Modes

Mode	Behavior
`per_prefix` (default)	Each weight group picks its own strategy based on local conflict data
`global`	Single strategy for all prefixes (original behavior)
`additive`	Simple weighted addition — no conflict resolution. Preserves all weights exactly. Use for DPO/edit/distill LoRAs, or with patch compression for minimal RAM

Block Strategy Map

The analysis report includes a visual block-by-block map showing what strategy was used and why:

--- Block Strategy Map ---
  input_blocks.0   ====  sum  1 LoRA (6x)
  input_blocks.4   ----  avg  12% conflict (6x)
  middle_block.1   ####  TIES 42% conflict (6x)
  output_blocks.3  ----  avg  8% conflict (6x)
  output_blocks.8  ====  sum  1 LoRA (6x)
  Legend: ==== sum (single LoRA)  ---- avg (compatible)  #### TIES (conflict)

Memory Options

Option	Default	Effect
`cache_patches`	enabled	Cache merged patches in RAM for faster re-execution. Disable to free RAM after merge (recommended for video models)
`patch_compression`	smart	SVD re-compression of merged patches (see above)
`svd_device`	gpu	Device for SVD compression. GPU is ~10-50x faster than CPU. Use CPU if GPU memory is tight
`free_vram_between_passes`	disabled	Release GPU cache between analysis and merge passes. Lowers peak VRAM at negligible speed cost

Example Report

==================================================
LORA OPTIMIZER - ANALYSIS REPORT
==================================================
Architecture preset: sd_unet (SD/SDXL UNet)

--- Per-LoRA Analysis ---
  style_lora.safetensors:
    Strength: 1.0
    Keys: 192
    Avg rank: 64
    L2 norm (mean): 0.0847
  detail_lora.safetensors:
    Strength: 0.8
    Keys: 192
    Avg rank: 32
    L2 norm (mean): 0.0423

--- Auto-Strength Adjustment ---
  style_lora.safetensors: 1.0 -> 0.6345
  detail_lora.safetensors: 0.8 -> 0.5076
  Scale factor: 0.6345
  Method: interference-aware energy normalization
    Avg pairwise cosine similarity: 0.312 (mostly aligned (reinforcing))
    Interference-aware energy: 0.1335 (orthogonal assumption: 0.1196)

--- Pairwise Analysis ---
  style_lora.safetensors vs detail_lora.safetensors:
    Overlapping positions: 89420
    Sign conflicts: 31297 (35.0%)
    Cosine similarity: 0.312

--- Collection Statistics ---
  Total LoRAs: 2
  Total unique keys: 196
  Avg sign conflict ratio: 35.0%
  Magnitude ratio (max/min L2): 2.00x

--- Auto-Selected Parameters ---
  Merge mode: ties
  Density: 0.42
  Sign method: frequency
  Sparsification: DARE
  Sparsification density: 0.70 (keep rate)
  For TIES prefixes: replaces trim step; others: preprocessing
  (global fallback — each prefix uses its own parameters)

--- Per-Prefix Strategy ---
  weighted_sum (single LoRA):        28 prefixes (14%)
  weighted_average (low conflict):  120 prefixes (61%)
  ties (high conflict):              48 prefixes (24%)
  Total:                            196 prefixes

--- Block Strategy Map ---
  input_blocks.0   ====  sum  1 LoRA (6x)
  input_blocks.1   ====  sum  1 LoRA (6x)
  input_blocks.4   ----  avg  12% conflict (6x)
  input_blocks.5   ####  TIES 38% conflict (6x)
  middle_block.1   ####  TIES 42% conflict (6x)
  output_blocks.3  ----  avg  15% conflict (6x)
  output_blocks.8  ====  sum  1 LoRA (6x)
  Legend: ==== sum (single LoRA)  ---- avg (compatible)  #### TIES (conflict)

--- Reasoning ---
  Sign conflict ratio 35.0% > 25% threshold -> TIES mode selected
    TIES resolves sign conflicts via trim + elect sign + disjoint merge
  Auto-density estimated at 0.42 from magnitude distribution
  Magnitude ratio 2.00x <= 2x -> 'frequency' sign method (equal voting)
    Similar-strength LoRAs get equal votes

--- Merge Summary ---
  Keys processed: 196
  Model patches: 168
  CLIP patches: 28
  Output strength: 1.0
  CLIP strength: 1.0

==================================================

Connect the STRING output to a Show Text node to see the report in ComfyUI.

Important notes & limitations

Structural & Edit LoRAs: Do not put distillation LoRAs (LCM, Lightning, Turbo, Hyper), DPO LoRAs, or edit model LoRAs (Qwen edit, Klein edit, instruction-editing LoRAs) in the optimizer stack. These LoRAs modify the model's fundamental behavior — their weights are precisely calibrated and merging them with style LoRAs can break their training. Apply them via a standard Load LoRA node upstream, then feed only your style/character LoRAs into the optimizer. If you must include an edit LoRA in the stack, use additive mode and disable sparsification to avoid weight trimming.

Limitation: The optimizer only analyzes LoRAs in its own stack. It cannot see LoRA patches applied by upstream nodes (Load LoRA, etc.) — those stack additively on top of the optimizer's output. Fully baked merges (safetensors checkpoints) are indistinguishable from base weights and cannot be detected.

AutoTuner

Automatically sweeps all merge parameters (mode, sparsification, density, dampening, quality level) and ranks configurations for your LoRA stack. Runs Pass 1 analysis once, scores all parameter combinations via heuristic proxies, then merges the top-N candidates and measures output quality. When an AUTOTUNER_EVALUATOR is connected, the built-in score can be blended with external prompt/reference evaluation logic. Outputs the highest-ranked merge directly as MODEL/CLIP, plus a ranked report and TUNER_DATA for exploring alternatives via a Merge Selector node.

Full parameter reference: Nodes wiki page

Diff Cache

During the parameter sweep, each candidate recomputes raw LoRA diffs (A@B matmul) from scratch — even though diffs depend only on LoRA content, not merge config. The diff cache stores these diffs after the first candidate and reuses them for subsequent candidates, eliminating redundant computation.

Mode	Behavior
`disabled`	Recomputes diffs each time. No extra memory
`auto`	Uses RAM up to `diff_cache_ram_pct` of free memory, then spills to disk. Recommended for most setups
`ram`	All diffs in RAM. Fastest, but uses ~1.5 GB (SDXL) to ~6 GB (Flux)
`disk`	All diffs to temp files with memory-mapping. Slowest cache mode, but minimal RAM

When auto mode runs out of disk space, it falls back to RAM automatically.

Setting	Default	Effect
`diff_cache_mode`	auto	Cache mode selection
`diff_cache_ram_pct`	0.5	Fraction of free system RAM for `auto` mode (0.1–0.9)

VRAM Budget

The vram_budget slider (0.0–1.0) controls what fraction of free VRAM to use for storing merged patches on GPU. Default is 0 (all patches on CPU). Setting it higher keeps patches on GPU, reducing RAM usage on systems with enough VRAM. Available on both LoRA Optimizer and LoRA AutoTuner.

Community Cache

LoRA analysis results (conflict metrics, per-LoRA stats, best merge configs) are hardware-agnostic — the same LoRA files always produce the same output regardless of GPU. The community cache lets any user download precomputed results for their LoRAs without running the AutoTuner sweep, and optionally contribute their own results back.

Results are keyed by content hash (SHA256 of file contents, not filename), so they match across different users and folder layouts. LoRA names and paths are never shared.

`community_cache` value	Behavior
`disabled`	No community interaction (default)
`upload_and_download`	Downloads before analysis; uploads after if local score is higher

Downloads are anonymous (no setup required). Uploads require a HF_TOKEN environment variable with write access to the dataset repo.

Results are stored in the public dataset ethanfel/lora-optimizer-community-cache.

Other Nodes & Workflows

Merge Selector

Applies a specific configuration from AutoTuner results without re-running the sweep. Connect TUNER_DATA from a LoRA AutoTuner (or Load Tuner Data) node and set the selection index to choose which ranked configuration to apply (1 = top-ranked, 2 = next-ranked, etc.).

Workflow:

LoRA AutoTuner → TUNER_DATA → Merge Selector (selection=2) → try the 2nd-ranked config
                      ↓
              Save Tuner Data → (reload later) → Load Tuner Data → Merge Selector

AutoTuner → Optimizer Bridge

Chain the AutoTuner and the Legacy optimizer in a single model line for a "rank, then tweak" workflow. Only one node merges at a time — the other passes the model through. The Legacy optimizer's settings_source switch controls which node is authoritative, and the UI bridge keeps the paired widgets in sync.

[Load Model] → [AutoTuner] → model → [Optimizer (Legacy)] → MODEL → sampler
[LoRA Stack]  → [AutoTuner]
[LoRA Stack]  → [Optimizer (Legacy)]
               [AutoTuner] → tuner_data → [Optimizer (Legacy)]

Legacy Optimizer `settings_source`	What happens
`from_autotuner`	AutoTuner merges → Legacy Optimizer passes through. Optimizer widgets show the winning config.
`manual`	AutoTuner passes the base model through → Legacy Optimizer merges with its own widget settings.
`from_tuner_data`	Legacy Optimizer reads settings from connected `tuner_data` input.

Typical flow:

Start with from_autotuner — let the AutoTuner find the best config
Inspect the Optimizer's widgets to see what won
Switch to manual — the Optimizer takes over, starting from the AutoTuner's recommendation
Tweak settings (merge_refinement, sparsification, etc.) and re-run

Switching between modes is instant — the AutoTuner reuses its cached sweep results.

Note: The bridge workflow requires the LoRA Optimizer (Legacy) node. The simplified LoRA Optimizer uses Settings nodes and tuner_data input instead.

Save / Load Tuner Data

Two utility nodes for persisting AutoTuner results to disk:

Save Tuner Data — Saves TUNER_DATA into a selected tuner_data folder as .tuner or .json. Subdirectories are allowed; path traversal outside that folder is blocked. Optional overwrite control avoids clobbering previous runs. OUTPUT_NODE = True.

Load Tuner Data — Dropdown of saved tuner data files. Outputs TUNER_DATA ready for Merge Selector. Auto-reloads when the file changes on disk.

Evaluator Utilities

Build AutoTuner Python Evaluator — packages a Python module path + callable name into an AUTOTUNER_EVALUATOR object. The callable can run prompts, compare references, and return a score in [0, 1].

The evaluator callable receives keyword arguments: model, clip, lora_data, config, context, and analysis_summary.

Save Merged LoRA

Saves the optimizer's merged result as a standalone .safetensors file that works with any standard LoRA loader.

Connect the LORA_DATA output from LoRA Optimizer to this node.

Option	Default	Effect
`save_folder`	first configured LoRA folder	Choose which configured ComfyUI LoRA directory to save into
`filename`	`merged_lora`	File name relative to `save_folder`. Subdirectories are allowed (e.g. `merged/my_lora`)
`save_rank`	0 (auto)	0 = use each layer's existing rank from the merge. Non-zero = force this rank for layers that need compression
`bake_strength`	enabled	When on, the saved LoRA reproduces your exact merge at strength 1.0. When off, strengths are not baked in

Outputs: STRING (file path)

Merged LoRA to Hook

Wraps the optimizer's merged patches as a conditioning hook (HOOKS) for per-conditioning LoRA application. Instead of applying the merged LoRA globally to the model, you can attach it to specific conditioning entries using ComfyUI's hook system.

Connect the LORA_DATA output from LoRA Optimizer to this node, then connect the HOOKS output to a Cond Set Props (or similar) node.

Inputs: LORA_DATA (required), HOOKS (optional — chain with existing hooks)

Outputs: HOOKS

Use this node when you want the merged LoRA to apply only to specific conditioning rather than the entire model:

Per-prompt LoRA: Apply different merged LoRAs to positive vs negative conditioning
Scheduled application: Combine with hook keyframes to apply the LoRA only during certain sampling steps
Regional conditioning: Use with area-based conditioning to apply the LoRA to specific image regions
Preserving the base model: Keep the MODEL output clean (unpatched) while still using the merged LoRA through conditioning hooks

Workflow example:

Load Checkpoint → MODEL ──┬──→ LoRA Optimizer → LORA_DATA → Merged LoRA to Hook → HOOKS
                           │                                                          ↓
                           └──→ KSampler ←──── Conditioning ←──── Cond Set Props

The prev_hooks input allows chaining multiple hook sources together.

WanVideo LoRA Optimizer

Variant of the LoRA Optimizer for WanVideo models (via kijai's WanVideoWrapper). Accepts WANVIDEOMODEL instead of MODEL, skips CLIP, and applies merged patches in-memory.

All merging algorithms are inherited — TIES, DARE/DELLA, SVD compression, auto-strength, per-prefix adaptive merge, merge refinement (KnOTS, orthogonalization, TALL-masks), and Wan key normalization (LyCORIS, diffusers, Fun LoRA, finetrainer, RS-LoRA) all work identically.

Basic workflow:

WanVideoModelLoader → WANVIDEOMODEL → WanVideo LoRA Optimizer → WANVIDEOMODEL → WanVideoSampler
                                               ↑
                        LoRA Stack ─────────────┘

Chaining with individual LoRAs: Individual (non-merged) LoRAs go through WanVideoLoraSelect → model loader as usual. Our optimizer applies merged LoRAs on top — both coexist in the model patcher.

WanVideoLoraSelect → WanVideoModelLoader → WANVIDEOMODEL → WanVideo LoRA Optimizer → Sampler
                                                                    ↑
                                             LoRA Stack ────────────┘

Key defaults differ from the standard optimizer:

normalize_keys = enabled — WanVideo LoRAs come from many trainers, normalization is commonly needed
cache_patches = disabled — video models are large, caching uses significant RAM
architecture_preset = dit — DiT-tuned thresholds (higher density floor, wider strength range)

Compatibility

Models: SD 1.5, SDXL, Flux, Z-Image (Lumina2), Wan 2.1/2.2, LTX Video, ACE-Step, Qwen-Image, and other architectures supported by ComfyUI
LoRA formats: Standard LoRA, LoCon, and LoRA/LoCon-style trainer variants whose tensors reduce to up/down(/mid) adapters (including many diffusers/PEFT and LyCORIS naming schemes)
Trainers: Kohya, AI-Toolkit, LyCORIS, Musubi Tuner, diffusers — auto-normalized when normalize_keys is enabled
Flux sliced weights: Handled correctly (linear1_qkv offsets)
Z-Image fused QKV: Split for per-component analysis, re-fused after merge
Stack formats: Native LoRA Stack dicts, plus standard tuples from Efficiency Nodes / Comfyroll

Credits

Originally based on ComfyUI-ZImage-LoRA-Merger by DanrisiUA
Per-prefix adaptive approach inspired by comfyUI-Realtime-Lora by shootthesound (per-block LoRA analysis)
Thanks to Scruffy and Ramonguthrie for suggesting the per-block analysis approach
TIES-Merging: Yadav et al., NeurIPS 2023
DARE: Yu et al., ICML 2024 — Drop And REscale for language model merging
DELLA: Deep et al., 2024 — magnitude-aware sparsification
KnOTS: Ramé et al., 2024 — SVD alignment for model merging
TALL-masks: Wang et al., 2024 — selfish weight protection via task-aware masks
Column-wise merging inspired by ZipLoRA: Shah et al., 2025 — structural sparsity for LoRA merging

Development Timeline

License

GPL-3.0 License - see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 493 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
example_workflows		example_workflows
js		js
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
conftest.py		conftest.py
kernel.py		kernel.py
lora_optimizer.py		lora_optimizer.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Problem

Merge Strategy Guide

Quick Start

Installation

ComfyUI Manager

Nodes at a Glance

Z-Image (Lumina2) — 3 LoRAs merged

How It Works

Two-Pass Streaming Architecture

Optimizer Features

AutoTuner

Merge Selector

AutoTuner → Optimizer Bridge

Save / Load Tuner Data

Evaluator Utilities

Save Merged LoRA

Merged LoRA to Hook

WanVideo LoRA Optimizer

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The Problem

Merge Strategy Guide

Quick Start

Installation

ComfyUI Manager

Nodes at a Glance

Z-Image (Lumina2) — 3 LoRAs merged

How It Works

Two-Pass Streaming Architecture

Optimizer Features

AutoTuner

Merge Selector

AutoTuner → Optimizer Bridge

Save / Load Tuner Data

Evaluator Utilities

Save Merged LoRA

Merged LoRA to Hook

WanVideo LoRA Optimizer

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages