UPSTREAM PR #1273: fix: avoid black images if using an invalid VAE (for SDXL) by loci-dev · Pull Request #62 · auroralabs-loci/stable-diffusion.cpp

loci-dev · 2026-02-19T04:20:41Z

Note

Source pull request: leejet/stable-diffusion.cpp#1273

If we inadvertently provide an invalid VAE file (for example --vae sdxl_invalid_vae.sft ) we will get a black image later after some U-Net loops. This can happen due to typos, invalid symlinks etc. etc.
So now we better act like using option --force-sdxl-vae-conv-scale .

loci-review · 2026-02-19T05:29:40Z

Overview

Analysis of 48,313 functions across two binaries reveals mixed performance impact from a single commit implementing VAE validation for SDXL models. Modified functions: 58 (0.12%). New: 0. Removed: 0. Unchanged: 48,255.

Binaries analyzed:

build.bin.sd-server: 515,491 nJ → 518,784 nJ (+0.64%)
build.bin.sd-cli: 480,110 nJ → 483,568 nJ (+0.72%)

Function Analysis

Critical improvements:

ggml_vec_dot_f32 (build.bin.sd-cli): Response time 1966ns → 1771ns (-10.0%, -196ns), throughput time 1950ns → 1755ns (-10.0%, -196ns). This ARM NEON vectorized function is called millions of times per inference, providing substantial cumulative performance gains across all matrix operations.

Concerning regressions:

forward_mul_mat (build.bin.sd-server): Response time 15,028ns → 15,736ns (+4.7%, +708ns), throughput time stable at 2,377ns (-0.016%). Regression occurs in child functions rather than core algorithm. Affects quantized matrix multiplication in UNet layers, called thousands of times per inference.

Initialization regressions:

std::vector<gguf_kv>::begin() (build.bin.sd-server): Throughput time 61ns → 243ns (+297%, +182ns)
std::shared_ptr::_M_destroy for T5CLIPEmbedder (build.bin.sd-cli): Throughput time 105ns → 294ns (+180%, +189ns)
make_block_q4_Kx8 (build.bin.sd-server): Response time 8,126ns → 8,768ns (+7.9%, +642ns)

Initialization improvements:

std::make_move_iterator (build.bin.sd-server): Throughput time 246ns → 78ns (-68.4%, -169ns)
Darts::AutoPool::resize_buf (build.bin.sd-cli): Throughput time 300ns → 247ns (-17.5%, -53ns)

Other analyzed functions showed minor changes in STL operations, swap functions, and container management with negligible cumulative impact.

Additional Findings

The commit modified only src/stable-diffusion.cpp to add VAE validation logic preventing black images in SDXL. Most performance changes stem from compiler optimization variations rather than source modifications. The 10% improvement in ggml_vec_dot_f32 (extremely high call frequency) likely outweighs the 4.7% regression in forward_mul_mat, resulting in net positive inference performance. Initialization regressions add microseconds to multi-second model loading, representing negligible user impact. The correctness improvement justifies minor performance trade-offs.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

fix: avoid black images if using an invalid VAE (for SDXL)

0ad1d1b

loci-dev temporarily deployed to stable-diffusion-cpp-prod February 19, 2026 04:20 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #1273: fix: avoid black images if using an invalid VAE (for SDXL)#62

UPSTREAM PR #1273: fix: avoid black images if using an invalid VAE (for SDXL)#62
loci-dev wants to merge 1 commit intomainfrom
loci/pr-1273-leejet_reorg

loci-dev commented Feb 19, 2026

Uh oh!

loci-review bot commented Feb 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

loci-dev commented Feb 19, 2026

Uh oh!

loci-review bot commented Feb 19, 2026

Overview

Function Analysis

Additional Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments