init whisper#6
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces Whisper-based audio feature extraction (log-mel spectrograms) and wires the new extractors into the model loader, along with example YAML configs to run the pipeline with these features.
Changes:
- Add
Whisper80ExtractorandWhisper128Extractorto compute 80/128-bin Whisper log-mel spectrogram features. - Register Whisper extractors in
models/utils.pyfor dynamic model loading. - Add example pipeline configs for running Whisper80/Whisper128 audio extraction.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| models/whisper_80.py | New Whisper 80-mel extractor implementation. |
| models/whisper_128.py | New Whisper 128-mel extractor implementation. |
| models/utils.py | Registers new model(s) in the global model_map and imports extractors. |
| config/whisper80_audio.yaml | Adds an example config to run Whisper80 mel extraction. |
| config/whisper128_audio.yaml | Adds an example config to run Whisper128 mel extraction. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from models.open_smile import OpenSmileExtractor | ||
| from models.whisper_80 import Whisper80Extractor |
There was a problem hiding this comment.
models.open_smile is imported and registered, but there is no corresponding models/open_smile.py in the repo. This will raise ModuleNotFoundError at import time and prevent the pipeline from starting. Either add/commit the missing extractor module or remove the import and model_map entry.
| from models.whisper_80 import Whisper80Extractor | ||
| from models.whisper_128 import Whisper128Extractor |
There was a problem hiding this comment.
Importing Whisper extractors at module import time makes openai-whisper a hard dependency for any run (since worker.py imports models.utils), even if Whisper isn’t selected. Consider lazy-loading these optional backends (e.g., wrap the imports/model_map registration in try/except ImportError or import inside load_model only when model_name matches) so missing optional deps don’t break unrelated models.
| from models.base import BaseModel | ||
| try: | ||
| import whisper | ||
| except : |
There was a problem hiding this comment.
Bare except: will also catch unrelated errors (e.g., runtime errors inside the dependency) and can mask the real failure reason. Catch ImportError explicitly here so only missing-dependency cases trigger this message.
| except : | |
| except ImportError: |
| if output == "mel_features": | ||
|
|
||
| return { | ||
| "mels": mel.cpu().numpy() , # shape [num_patches, hidden_dim] |
There was a problem hiding this comment.
The inline shape comment is incorrect for Whisper mel spectrograms: whisper.log_mel_spectrogram returns something shaped like [n_mels, n_frames] (not [num_patches, hidden_dim]). This can confuse downstream users when inspecting saved Arrow outputs; update the comment to match the actual tensor shape.
| "mels": mel.cpu().numpy() , # shape [num_patches, hidden_dim] | |
| "mels": mel.cpu().numpy() , # shape [n_mels, n_frames] |
| audiopath (str, optional): Path to the input audio file. | ||
|
|
||
| Returns: | ||
| torch.Tensor: A tuple containing the audio tensor and the sample rate. |
There was a problem hiding this comment.
This docstring claims the method returns a (tensor, sample_rate) tuple, but the implementation returns only the audio array from whisper.load_audio. Please correct the return type/description (and the audiopath arg name) to avoid misleading callers.
| audiopath (str, optional): Path to the input audio file. | |
| Returns: | |
| torch.Tensor: A tuple containing the audio tensor and the sample rate. | |
| audio_path (str, optional): Path to the input audio file. | |
| Returns: | |
| numpy.ndarray: Audio samples returned by ``whisper.load_audio``, | |
| resampled to ``self.sampling_rate``. |
| if output == "mel_features": | ||
|
|
||
| return { | ||
| "mels": mel.cpu().numpy() , # shape [num_patches, hidden_dim] |
There was a problem hiding this comment.
The inline shape comment is incorrect for Whisper mel spectrograms: whisper.log_mel_spectrogram returns something shaped like [n_mels, n_frames] (not [num_patches, hidden_dim]). This can confuse downstream users when inspecting saved Arrow outputs; update the comment to match the actual tensor shape.
| "mels": mel.cpu().numpy() , # shape [num_patches, hidden_dim] | |
| "mels": mel.cpu().numpy() , # shape [n_mels, n_frames] |
| Load an audio file and resample if necessary. | ||
|
|
||
| Args: | ||
| audiopath (str, optional): Path to the input audio file. | ||
|
|
||
| Returns: | ||
| torch.Tensor: A tuple containing the audio tensor and the sample rate. |
There was a problem hiding this comment.
This docstring claims the method returns a (tensor, sample_rate) tuple, but the implementation returns only the audio array from whisper.load_audio. Please correct the return type/description (and the audiopath arg name) to avoid misleading callers.
| Load an audio file and resample if necessary. | |
| Args: | |
| audiopath (str, optional): Path to the input audio file. | |
| Returns: | |
| torch.Tensor: A tuple containing the audio tensor and the sample rate. | |
| Load an audio file and resample it to the configured sampling rate. | |
| Args: | |
| audio_path (str, optional): Path to the input audio file. | |
| Returns: | |
| np.ndarray: Audio samples returned by ``whisper.load_audio``, | |
| resampled to ``self.sampling_rate``. |
| if "speech_mels_features" in self.feature_list: | ||
| mel_features = self.extract_mels(audio_data) | ||
| return [("speech_mels_features", mel_features)] | ||
|
|
There was a problem hiding this comment.
If speech_mels_features is not in feature_list, this method returns None, which will raise a TypeError in worker.py where it blindly iterates for feature_name, feature_value in extractor.extract_features(...). Return an empty list (or yield nothing) instead so the extractor is safe under misconfiguration.
| return [] |
|
|
||
| phases: | ||
| - model: | ||
| name: "Whisper80" |
There was a problem hiding this comment.
main.launch_workers_for_phase() expects phase["model"]["base_dir"] to exist when building buffer_root. This config omits base_dir, so running with it will raise KeyError. Add a base_dir (consistent with other configs like first_batch_pyarrow.yaml).
| name: "Whisper80" | |
| name: "Whisper80" | |
| base_dir: "Whisper80" |
| phases: | ||
| - model: | ||
| name: "Whisper128" | ||
| path: "models/Whisper128" |
There was a problem hiding this comment.
main.launch_workers_for_phase() expects phase["model"]["base_dir"] to exist when building buffer_root. This config omits base_dir, so running with it will raise KeyError. Add a base_dir (consistent with other configs like first_batch_pyarrow.yaml).
| path: "models/Whisper128" | |
| path: "models/Whisper128" | |
| base_dir: "models/Whisper128" |
Add whisper feature