Skip to content

rsv-ben: fix fps detection for CFR reference clips#286

Open
kevinthelobster wants to merge 1 commit into
anthwlock:masterfrom
kevinthelobster:fix-rsv-ben-cfr-fps
Open

rsv-ben: fix fps detection for CFR reference clips#286
kevinthelobster wants to merge 1 commit into
anthwlock:masterfrom
kevinthelobster:fix-rsv-ben-cfr-fps

Conversation

@kevinthelobster
Copy link
Copy Markdown

Problem

-rsv-ben recovery of Sony recording-in-progress (.RSV) files produces, for NTSC-fractional footage (23.976 / 29.97 / 59.94 fps):

  • one corrupt video frame per GOP — visible as a hitch roughly every half-second, and
  • audio that ends short of the video (~4.5 s over a 75-minute clip).

This looks like an inherent limitation of recovering an unfinalized file, so it's easy to misattribute as unavoidable data loss. It is not — the frame data is intact in the .RSV; it is mis-spliced.

Root cause

repairRsvBen() derives the video frame duration only from Track::times_:

if (!video_track.times_.empty())
    video_duration_per_sample = video_track.times_[0];

But Track::getSampleTimes() collapses a single-entry (constant-rate) stts into constant_duration_ and leaves times_ empty:

if (entries == 1 && nsamples1 > 500) {
    constant_duration_ = stts->readInt(12);   // times_ stays empty
}

Every Sony camera records CFR, so the reference clip's times_ is empty and video_duration_per_sample keeps its 1000 default. fps is then computed as 24000 / 1000 = 24 instead of the true 24000 / 1001 = 23.976.

That feeds the per-GOP audio chunk size:

audio_samples_per_chunk = gop_duration_sec * audio_sample_rate

giving 12 / 24 * 48000 = 24000 samples (96000 bytes) instead of the correct 12 * 48000 * 1001 / 24000 = 24024 samples (96096 bytes) — off by ~96 bytes per GOP. audio_boundary = next_rtmd_start - total_audio_chunk_size is then ~96 bytes too late, so the last video frame of every GOP swallows ~96 bytes of PCM (corrupt frame), and the audio chunk is ~96 bytes short every GOP (cumulative drift).

Empirically confirmed on a 58 GB FX30 XAVC S .RSV: at the 96096-byte boundary the bytes before are high-entropy H.264 and after are low-amplitude 16-bit PCM; at 96000 both sides are PCM.

Fix

  • Read constant_duration_ when times_ is empty (the actual fix).
  • Compute audio_samples_per_chunk with exact integer rounding so float truncation cannot drop a sample at other rates.

Verification

Patched build, same 58 GB FX30 .RSV + a same-codec CFR reference:

  • derived parameters: fps=23.976, GOP duration=0.5005s, audio chunk=96096 bytes (was fps=24 … 96000)
  • Duration of avc1 and Duration of twos now identical (were 4.5 s apart)
  • continuous decode: 0 per-GOP errors (was ~1 corrupt frame per 12-frame GOP); only the genuinely truncated final ~2 s (point of recording interruption) is damaged.

repairRsvBen() read the video frame duration only from Track::times_,
but getSampleTimes() collapses a constant-rate stts into
constant_duration_ and leaves times_ empty. Every Sony camera records
CFR, so times_ was empty and video_duration_per_sample kept its 1000
default. fps was therefore computed as 24 instead of 23.976
(24000/1001), making the per-GOP audio chunk 96000 bytes instead of
96096.

That ~96-byte/GOP error appended PCM to each GOP's last video frame
(one corrupt frame per GOP -> ~0.5s judder) and truncated audio by the
same amount (~4.5s short over 75 min) for NTSC-fractional footage
(23.976/29.97/59.94).

Fix: use constant_duration_ when times_ is empty. Also compute
audio_samples_per_chunk with exact integer rounding so float
truncation can't drop a sample at other rates.

Verified on a 58GB Sony FX30 XAVC S .RSV: audio/video durations now
match exactly and the file decodes with zero per-GOP errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants