Draft
Conversation
- Rebase onto main (includes remote segment_audio / 5c5557a) - asr_chunks_to_text + model/client injection; _build_output_rows, _infer_remote - Parakeet long-audio split; media_interface ffprobe robustness - inprocess: _load_doc_to_df / _iter_doc_chunks; asr_chunks_to_text with model - stage: asr_chunks_to_text; tests; no apply_asr_to_df Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Refactors ASR so all “chunk rows → transcript rows” logic goes through
asr_chunks_to_text, withASRActoras a thin wrapper.What changed
asr_chunks_to_text(batch_df, model=..., client=..., asr_params=...)Single batch entry point for ASR.
ASRActoronly constructsmodel/clientfromASRParamsand delegates here.Injectable
model/clientInprocess and the GPU pool can pass a
ParakeetCTC1B1ASR(or remote client) so the same code path runs inside and outside Raymap_batches.Long audio (local Parakeet)
ParakeetCTC1B1ASRsplits inputs that exceed the model length budget and concatenates transcripts.Media probing
media_interface: more robust ffprobe handling whendurationorbit_rateis missing (e.g. VBR / bad probes).Inprocess
_load_doc_to_df/_iter_doc_chunksunify loading for pdf / html / image / audio / txt in the ingest loop.API
Removed
apply_asr_to_df; useasr_chunks_to_textdirectly (tests updated).Why
modelsupports GPU pool / avoids duplicate model setup where the caller already holds the model.mainbehavior for remotesegment_audiowhile keeping the refactored structure.Testing
pytest nemo_retriever/tests/test_asr_actor.pyChecklist