Skip to content

Conversation

@mzeynali
Copy link

What does this PR do?

This PR adds a training script for Latent Consistency Model (LCM) distillation applied to InstructPix2Pix with Stable Diffusion XL. This enables fast, few-step image editing (1-4 steps) while maintaining high-quality outputs from instruction-based editing.

Key Features

  • LCM Distillation Pipeline: Implements teacher-student distillation where a pre-trained InstructPix2Pix SDXL model (teacher) guides training of a lightweight student model capable of single-step inference
  • 8-Channel U-Net Support: Properly handles InstructPix2Pix's concatenated input (noisy latent + original image latent)
  • Time Conditioning: Adds guidance scale embedding to student U-Net for flexible inference
  • EMA Target Network: Uses exponential moving average for stable training targets
  • DDIM Solver Integration: Implements multi-step teacher predictions with classifier-free guidance
  • Flexible Loss Functions: Supports both L2 and Huber loss for robust training
  • Production-Ready: Includes validation, checkpointing, mixed precision, gradient checkpointing, and xFormers support

Training Algorithm

  1. Sample timestep from DDIM schedule
  2. Add noise to latents and sample guidance scale $w \in [w_{min}, w_{max}]$
  3. Student makes single-step prediction from noisy latents
  4. Teacher performs multi-step DDIM prediction with CFG
  5. Target network (EMA of student) generates stable training target
  6. Compute loss between student and target predictions
  7. Update student parameters and EMA update target network

Use Case

This script allows researchers and practitioners to create fast InstructPix2Pix SDXL models that can perform high-quality image editing in just 4 inference steps instead of 50+, making real-time image editing applications feasible.

Who can review?

@yiyixuxu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant