Skip to content

Optimize LCC reader for large scenes (~5.6x faster)#188

Merged
slimbuck merged 3 commits intoplaycanvas:mainfrom
slimbuck:lcc-dev
Mar 21, 2026
Merged

Optimize LCC reader for large scenes (~5.6x faster)#188
slimbuck merged 3 commits intoplaycanvas:mainfrom
slimbuck:lcc-dev

Conversation

@slimbuck
Copy link
Member

Summary

Optimizes the LCC (XGrids) reader to handle large scenes efficiently. Tested on a 139M splat scene (9GB on disk), load time drops from 200s to 36s (~5.6x speedup) with significantly reduced peak memory usage.

Changes

  • Eliminate per-splat allocations: decodeRotation no longer creates a temporary array per call; SH decoding no longer creates 15 Vec3 objects per splat. New decodeRotationInto writes directly to output arrays.
  • Remove double-buffering: processUnit writes directly into the shared output arrays instead of allocating per-unit intermediate typed arrays and copying with .set().
  • Concurrent I/O: Units are dispatched with a bounded worker pool instead of sequential await per unit. No measurable impact for local disk, but benefits network-based file systems (browser URL loading).
  • Pre-combine LODs: All selected LODs are decoded directly into a single pre-allocated DataTable, eliminating the expensive post-read combine() step which was allocating ~35GB of intermediate buffers and copies for large scenes.
  • Replace DataView with typed array views: The hot decode loop uses Float32Array, Uint16Array, and Uint8Array views over the input buffer instead of DataView.get*() calls, avoiding per-access bounds checks and endianness handling overhead.

Performance

Benchmarked on a 139,722,883 splat LCC scene (Grow-House.lcc):

Version Time Speedup
Before 200s 1x
+ No allocations, no double-buffering 81s 2.5x
+ Pre-combined LODs (no combine step) 39s 5.1x
+ Typed array views 36s 5.6x

Memory

  • Eliminated ~35GB of transient allocations from the combine() step (per-unit intermediate arrays + final column merge)
  • Per-unit temporary Float32Array allocations removed (previously allocated and copied per unit, now writes directly to global arrays)
  • Billions of per-splat temporary object allocations eliminated (quaternion arrays, Vec3 instances), reducing GC pressure

@slimbuck slimbuck requested a review from Copilot March 21, 2026 18:08
@slimbuck slimbuck self-assigned this Mar 21, 2026
@slimbuck slimbuck added the enhancement New feature or request label Mar 21, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes the XGrids LCC reader hot path to significantly improve load time and memory usage on very large scenes by decoding directly into preallocated output buffers, using typed-array views for faster access, and concurrently dispatching unit reads.

Changes:

  • Decode rotations/SH coefficients without per-splat temporary allocations (write directly into output arrays).
  • Remove per-unit intermediate buffers by writing unit data straight into shared global arrays.
  • Add bounded-concurrency unit decoding and pre-combine selected LODs into a single DataTable (plus environment table).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@slimbuck slimbuck marked this pull request as ready for review March 21, 2026 20:42
@slimbuck slimbuck merged commit 8f44e54 into playcanvas:main Mar 21, 2026
3 checks passed
@slimbuck slimbuck deleted the lcc-dev branch March 21, 2026 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants