Skip to content

init whisper#6

Open
Stanwang1210 wants to merge 1 commit into
EverydayRespect:mp_storagefrom
Stanwang1210:whisper_mp_storage
Open

init whisper#6
Stanwang1210 wants to merge 1 commit into
EverydayRespect:mp_storagefrom
Stanwang1210:whisper_mp_storage

Conversation

@Stanwang1210

Copy link
Copy Markdown

Add whisper feature

Copilot AI review requested due to automatic review settings April 3, 2026 01:17

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces Whisper-based audio feature extraction (log-mel spectrograms) and wires the new extractors into the model loader, along with example YAML configs to run the pipeline with these features.

Changes:

  • Add Whisper80Extractor and Whisper128Extractor to compute 80/128-bin Whisper log-mel spectrogram features.
  • Register Whisper extractors in models/utils.py for dynamic model loading.
  • Add example pipeline configs for running Whisper80/Whisper128 audio extraction.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
models/whisper_80.py New Whisper 80-mel extractor implementation.
models/whisper_128.py New Whisper 128-mel extractor implementation.
models/utils.py Registers new model(s) in the global model_map and imports extractors.
config/whisper80_audio.yaml Adds an example config to run Whisper80 mel extraction.
config/whisper128_audio.yaml Adds an example config to run Whisper128 mel extraction.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread models/utils.py
Comment on lines +11 to +12
from models.open_smile import OpenSmileExtractor
from models.whisper_80 import Whisper80Extractor

Copilot AI Apr 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

models.open_smile is imported and registered, but there is no corresponding models/open_smile.py in the repo. This will raise ModuleNotFoundError at import time and prevent the pipeline from starting. Either add/commit the missing extractor module or remove the import and model_map entry.

Copilot uses AI. Check for mistakes.
Comment thread models/utils.py
Comment on lines +12 to +13
from models.whisper_80 import Whisper80Extractor
from models.whisper_128 import Whisper128Extractor

Copilot AI Apr 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing Whisper extractors at module import time makes openai-whisper a hard dependency for any run (since worker.py imports models.utils), even if Whisper isn’t selected. Consider lazy-loading these optional backends (e.g., wrap the imports/model_map registration in try/except ImportError or import inside load_model only when model_name matches) so missing optional deps don’t break unrelated models.

Copilot uses AI. Check for mistakes.
Comment thread models/whisper_80.py
from models.base import BaseModel
try:
import whisper
except :

Copilot AI Apr 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare except: will also catch unrelated errors (e.g., runtime errors inside the dependency) and can mask the real failure reason. Catch ImportError explicitly here so only missing-dependency cases trigger this message.

Suggested change
except :
except ImportError:

Copilot uses AI. Check for mistakes.
Comment thread models/whisper_80.py
if output == "mel_features":

return {
"mels": mel.cpu().numpy() , # shape [num_patches, hidden_dim]

Copilot AI Apr 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline shape comment is incorrect for Whisper mel spectrograms: whisper.log_mel_spectrogram returns something shaped like [n_mels, n_frames] (not [num_patches, hidden_dim]). This can confuse downstream users when inspecting saved Arrow outputs; update the comment to match the actual tensor shape.

Suggested change
"mels": mel.cpu().numpy() , # shape [num_patches, hidden_dim]
"mels": mel.cpu().numpy() , # shape [n_mels, n_frames]

Copilot uses AI. Check for mistakes.
Comment thread models/whisper_80.py
Comment on lines +53 to +56
audiopath (str, optional): Path to the input audio file.

Returns:
torch.Tensor: A tuple containing the audio tensor and the sample rate.

Copilot AI Apr 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring claims the method returns a (tensor, sample_rate) tuple, but the implementation returns only the audio array from whisper.load_audio. Please correct the return type/description (and the audiopath arg name) to avoid misleading callers.

Suggested change
audiopath (str, optional): Path to the input audio file.
Returns:
torch.Tensor: A tuple containing the audio tensor and the sample rate.
audio_path (str, optional): Path to the input audio file.
Returns:
numpy.ndarray: Audio samples returned by ``whisper.load_audio``,
resampled to ``self.sampling_rate``.

Copilot uses AI. Check for mistakes.
Comment thread models/whisper_128.py
if output == "mel_features":

return {
"mels": mel.cpu().numpy() , # shape [num_patches, hidden_dim]

Copilot AI Apr 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline shape comment is incorrect for Whisper mel spectrograms: whisper.log_mel_spectrogram returns something shaped like [n_mels, n_frames] (not [num_patches, hidden_dim]). This can confuse downstream users when inspecting saved Arrow outputs; update the comment to match the actual tensor shape.

Suggested change
"mels": mel.cpu().numpy() , # shape [num_patches, hidden_dim]
"mels": mel.cpu().numpy() , # shape [n_mels, n_frames]

Copilot uses AI. Check for mistakes.
Comment thread models/whisper_128.py
Comment on lines +50 to +56
Load an audio file and resample if necessary.

Args:
audiopath (str, optional): Path to the input audio file.

Returns:
torch.Tensor: A tuple containing the audio tensor and the sample rate.

Copilot AI Apr 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring claims the method returns a (tensor, sample_rate) tuple, but the implementation returns only the audio array from whisper.load_audio. Please correct the return type/description (and the audiopath arg name) to avoid misleading callers.

Suggested change
Load an audio file and resample if necessary.
Args:
audiopath (str, optional): Path to the input audio file.
Returns:
torch.Tensor: A tuple containing the audio tensor and the sample rate.
Load an audio file and resample it to the configured sampling rate.
Args:
audio_path (str, optional): Path to the input audio file.
Returns:
np.ndarray: Audio samples returned by ``whisper.load_audio``,
resampled to ``self.sampling_rate``.

Copilot uses AI. Check for mistakes.
Comment thread models/whisper_128.py
if "speech_mels_features" in self.feature_list:
mel_features = self.extract_mels(audio_data)
return [("speech_mels_features", mel_features)]

Copilot AI Apr 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If speech_mels_features is not in feature_list, this method returns None, which will raise a TypeError in worker.py where it blindly iterates for feature_name, feature_value in extractor.extract_features(...). Return an empty list (or yield nothing) instead so the extractor is safe under misconfiguration.

Suggested change
return []

Copilot uses AI. Check for mistakes.

phases:
- model:
name: "Whisper80"

Copilot AI Apr 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main.launch_workers_for_phase() expects phase["model"]["base_dir"] to exist when building buffer_root. This config omits base_dir, so running with it will raise KeyError. Add a base_dir (consistent with other configs like first_batch_pyarrow.yaml).

Suggested change
name: "Whisper80"
name: "Whisper80"
base_dir: "Whisper80"

Copilot uses AI. Check for mistakes.
phases:
- model:
name: "Whisper128"
path: "models/Whisper128"

Copilot AI Apr 3, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main.launch_workers_for_phase() expects phase["model"]["base_dir"] to exist when building buffer_root. This config omits base_dir, so running with it will raise KeyError. Add a base_dir (consistent with other configs like first_batch_pyarrow.yaml).

Suggested change
path: "models/Whisper128"
path: "models/Whisper128"
base_dir: "models/Whisper128"

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants