Telegram sync performance pass + AC4 codec support by Amonoman · Pull Request #2482 · PixelPlayerHQ/PixelPlayer

Amonoman · 2026-06-27T10:30:50Z

Improves Telegram sync performance and adds AC4 audio codec detection for cast.

Telegram sync performance

Telegram sync was taking about 10 minutes for a 100k-song channel. Three targeted fixes:

Parallelize per-song metadata refinement. File IO and ID3 tag parsing in syncTelegramData ran sequentially, one song at a time. Now runs concurrently (bounded semaphore, concurrency = 8) using the same pattern already used elsewhere in SyncWorker. Also removes a redundant duplicate resolveAlbumArtUri() call per song.
Skip redundant orphan-cleanup scans. incrementalSyncMusicData() ran two full-table NOT EXISTS scans on every chunk flush (about 200 times for 100k songs). Insert-only chunks can never produce an orphan, so cleanup is now skipped per-chunk (opt-out parameter, default unchanged for all other callers) and run once at the end instead.
Hoist a per-call regex in AudioMetadataReader.parseReplayGainDb() to a compiled-once constant, since it ran up to twice per song.

No behavior or output changes. Same songs, albums, artists, and cross-refs end up in the DB, just computed faster.

Tested: fresh install, 100k-song Telegram channel (6kbps MP3s with metadata, release build). Sync completed in about 50 seconds, down from about 10 minutes before these changes.

Cast

Adds AC4 audio codec detection for ISO BMFF containers in IsoBmffAudioCodecDetector, with unit tests.

Support detecting AC4 audio (used in iOS/MOV containers) in IsoBmffAudioCodecDetector, with corresponding unit tests.

Telegram sync was reading each song's file (file.exists() + ID3 tag parse via AudioMetadataReader) one at a time in a sequential forEach, inside syncTelegramData's chunk loop. For large channels (~100k songs) this serial per-song file IO was the dominant cost, taking on the order of 10 minutes. Extract that IO/CPU-bound step and run it concurrently per chunk using the same Semaphore + async/awaitAll pattern already used elsewhere in this file (see the MediaStore processing phase), bounded by a new TELEGRAM_METADATA_READ_CONCURRENCY = 8 to avoid saturating disk IO. The sequential artist/album dedup maps that follow remain untouched, since they mutate shared per-chunk state and aren't safe to parallelize as-is. Also drops a redundant resolveAlbumArtUri() call that was happening twice per song when a local file was found (value doesn't change between the two call sites). No behavior change to the resulting song/album/artist data — only the ordering/concurrency of how it's computed. Co-Authored-By: Claude <noreply@anthropic.com>

incrementalSyncMusicData() ran deleteOrphanedAlbums() and deleteOrphanedArtists() unconditionally at the end of every call. Both are full-table NOT EXISTS scans against songs / cross-refs. syncTelegramData() calls incrementalSyncMusicData() once per chunk (~200 times for a 100k-song channel at the current chunk size), so these scans ran ~200 times per sync even though pure-insert chunks can never produce an orphaned album or artist — only the dedicated deletion path can. Add an optional cleanupOrphans parameter (default true, preserving existing behavior for every other caller) and a new cleanupOrphanedMusicData() convenience method. syncTelegramData() now passes cleanupOrphans = false on every chunk flush and the batched deletion loop, then calls cleanupOrphanedMusicData() once after both finish. Other incrementalSyncMusicData call sites (MediaStore sync, Netease sync) are single-shot, not chunked loops, so they're unaffected and keep the default. Co-Authored-By: Claude <noreply@anthropic.com>

parseReplayGainDb() constructed a new Regex("(?i)[dD][bB]") on every invocation. It's called via extractReplayGainDb() up to twice per song read (track gain + album gain, each checking up to 3 property key fallbacks), so a 100k-song sync could compile this same pattern hundreds of thousands of times. Hoist it to a private val compiled once at class load. Investigated whether the two separate TagLib calls in read() (getAudioProperties() + getMetadata(), each on a dup()'d fd) were redundant and could be merged — confirmed against the upstream Kyant0/taglib test suite that this is the library's required usage pattern, not duplicated work, so left unchanged.

…or for SetTdlibParameters TdApi.SetTdlibParameters was being constructed with a 15-argument positional constructor, with a comment admitting the argument order was guessed ("the order varies by version... I will try the most common flat signature"). Verified against TDLib's official Java example and documentation: SetTdlibParameters has no flat multi-arg constructor at all, only a default constructor and one taking a single TdlibParameters object. The previous code was very likely silently passing values into the wrong fields on every app launch (e.g. apiId potentially landing in systemLanguageCode's slot), since boolean/string parameters of the wrong type would not fail at compile time given how the bindings are laid out. Replace with the documented pattern: construct TdApi.TdlibParameters() and assign each field by name, which cannot misalign the way a wrong guessed positional order can. Also explicitly sets enableStorageOptimizer = true, which was not reliably set by the old code (its effective value depended on exactly how the broken positional mapping landed). This matches TDLib's documented recommended default; flag for review if a different value is preferred.

…solution ForumTopicInfo's thread ID is resolved via reflection (field name isn't confirmed for this tdlibx fork from available documentation). Two debug logs ran on every topic during every forum sync: one dumping all reflected field names, one logging the resolved field name and value. Both were one-time discovery aids per their own comments, not meant to run indefinitely. Removed both. Kept the Timber.w fallback-path warning and Timber.e failure log, since those flag situations actually worth knowing about. Did not change the reflection logic itself or attempt to replace it with direct property access: multiple TDLib Java bindings (official tdlib/td, the tdlight fork, and TDLib's own GetForumTopic/ SearchChatMessages fields) consistently use "messageThreadId", which is already the first name this code tries, but this exact tdlibx 1.8.56 fork's ForumTopicInfo source wasn't directly inspectable to confirm a direct property reference would compile. If a past Logcat capture from this debug log is available, it would let this be replaced with a confirmed direct field reference instead.

TELEGRAM_METADATA_READ_CONCURRENCY was a flat constant (8) regardless of device. Each concurrent read does a real native TagLib JNI call, which is genuine CPU-bound work, not just blocking IO wait. Kotlin dispatchers schedule threads, they don't reserve CPU time, so heavy parallel CPU-bound work here can still starve the Main thread and cause visible UI jank on a device with few cores, while being needlessly conservative on a high-core device. Replace the flat constant with telegramMetadataReadConcurrency(), which scales with Runtime.getRuntime().availableProcessors() / 2, capped to [2, 4]. This is an engineering default reasoned from how coroutine dispatchers and native JNI calls interact with real CPU scheduling, not a profiled or measured-optimal value; it has not been verified against real device traces. Prompted by a field report of heavy UI lag during a large Telegram sync on a device that did not OOM (separate from the earlier OOM investigation). Lowering concurrency trades some of the previously measured sync-time improvement for reduced peak CPU contention; on the device used for the 50-second/100k-song benchmark, this will likely land at concurrency 4 (down from 8) and increase sync time accordingly. Also tightens the artist/album preload in syncTelegramData to build both derived maps in a single pass with pre-sized capacity, instead of two separate default-capacity .associate{} calls; this does not change the preload's fundamental scaling with existing-library size, which remains a known, separate limitation.

Three small, repeated costs found during a broader pass over SyncWorker.kt, each scaling with library size: - fetchMusicFromMediaStore: cursor.getString(dataCol) and cursor.getInt(trackCol) were each read twice per row (once for the directory filter / disc-vs-track split, once for the RawSongData fields). Cursor getters cross into native CursorWindow code, so this was a real, avoidable JNI call per row. Read once, reuse. - fetchMediaStoreIds: allocated two File objects per row (File(data).parent, then File(parentPath).absolutePath) to arrive back at a string MediaStore's DATA column already provides as an absolute path. Replaced with the same string-slicing approach already used for the identical check in fetchMusicFromMediaStore. - syncNeteaseData: toUnifiedNeteaseArtistId(name), a lowercase() + hashCode() computation, ran up to three times for the same artist name within the same song iteration (primary artist, the cross-ref loop, and the JSON artist refs). Computed once per song into artistIds and reused across all three. No behavior change in any of the three; all are pure redundant-work elimination, verified by re-deriving the same values via the same inputs.

A previous commit replaced this call with TdApi.TdlibParameters() field assignment, based on TDLib's official docs/example. That approach does not compile against this exact tdlibx:td:1.8.56 build: the real compiler error confirms TdApi.SetTdlibParameters here has only a no-arg constructor and a flat 14-argument one, no constructor taking a TdlibParameters object exists in this build at all. Compared the original (pre-revert) flat-constructor call against the compiler-confirmed real signature, argument type and position both match exactly, so the original code was correct for this build. The "argument order was guessed" comment that motivated the previous change was based on uncertainty that, in retrospect, wasn't actually a problem here. Reverted to the flat constructor, now with the real, compiler- verified signature documented in a comment instead of assumed from docs for a different TDLib binding, and named local variables in place of bare positional literals for review clarity.

fetchMusicFromMediaStore's incremental path queried musicDao.getSongsByIdsListSimple() twice for every song that turned out to have changed: once in Phase 2 to fetch its existing entity and decide whether it changed, and once again in Phase 3 for the exact same IDs to get that same entity for the user-edit-preserving merge step. Phase 2 already has the existing entity in hand right before discarding it. Carry it forward as part of songsToProcess (now List<Pair<RawSongData, SongEntity?>> instead of List<RawSongData>) instead of re-fetching it in Phase 3. Removes one DB round-trip per batch of changed songs; impact scales with how many songs actually changed in a given sync (near-zero for a quiet incremental sync, real for a deep/force-metadata rescan or any sync touching many files at once). No behavior change: same merge logic, same data, just sourced from the pair instead of a second query.

Amonoman and others added 12 commits June 22, 2026 18:57

feat(cast): add AC4 audio codec detection for ISO BMFF containers

76aa05c

Support detecting AC4 audio (used in iOS/MOV containers) in IsoBmffAudioCodecDetector, with corresponding unit tests.

Merge branch 'PixelPlayerHQ:master' into master

9758cf7

Merge branch 'PixelPlayerHQ:master' into master

b0d12f2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Telegram sync performance pass + AC4 codec support#2482

Telegram sync performance pass + AC4 codec support#2482
Amonoman wants to merge 12 commits into
PixelPlayerHQ:masterfrom
Amonoman:master

Amonoman commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Amonoman commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant