Skip to content

Canary streamatt#34

Open
azziko wants to merge 13 commits into
hlt-mt:mainfrom
azziko:canary-streamatt
Open

Canary streamatt#34
azziko wants to merge 13 commits into
hlt-mt:mainfrom
azziko:canary-streamatt

Conversation

@azziko
Copy link
Copy Markdown

@azziko azziko commented Apr 29, 2026

Changes:

  1. Add flag to the base streamatt, which determines whether the audio history is stored raw or in features
  2. Implement canary with streamatt

Resolves: #28

@azziko
Copy link
Copy Markdown
Author

azziko commented Apr 29, 2026

I'll fix the checks and run unit tests. I forgot about them to be honest

Let me know if the overall idea is fine

Copy link
Copy Markdown
Contributor

@mgaido91 mgaido91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you very much for your contribution @azziko ! The approach looks great to me and the code is very clean, thanks. I amonly concerned by the leading EOS, which I do not understand.

Only a couple of last points:

  • Can we please add a couple of unit tests for the audio history management? Only to ensure everything works like we expect and also future changes won''t break things.
  • This code relies on recent contribs to NeMo (thanks for them as well!), but currently we have in our dependencies nemo_toolkit[asr]==2.4.0 for canary. I think we have to update that.

Thanks!

Comment thread config/canary_streamatt.yaml
Comment thread simulstream/server/speech_processors/base_streamatt.py Outdated
Comment thread simulstream/server/speech_processors/canary_streamatt.py Outdated
Comment thread simulstream/server/speech_processors/canary_streamatt.py Outdated
Comment thread simulstream/server/speech_processors/canary_streamatt.py
Comment thread simulstream/server/speech_processors/canary_streamatt.py Outdated
Comment thread simulstream/server/speech_processors/canary_streamatt.py Outdated
Comment thread simulstream/server/speech_processors/canary_streamatt.py Outdated
Comment thread simulstream/server/speech_processors/canary_streamatt.py

return replace(self.transcription_cfg, prompt={"turns": turns})

def _remove_eos_tokens(self, token_ids: List[int]) -> List[int]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand, how and when can this happen? isn't it a problem for the attention to have these extra tokens?

Copy link
Copy Markdown
Author

@azziko azziko May 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we were testing our system with Canary for IWSLT, there were EOS tokens occasionally in the beginning of the hypothesis. While we haven't traced the exact reason why, I speculate it's because of the forced prefix. In our system we solved it this way. The fix should probably be better done on the NeMo side, though. I will look into that

isn't it a problem for the attention to have these extra tokens?

In our tests they were outputted together with the other prefiction, so I assume again that they don't distrupt the attention scores.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have a repro, I can also try to debug this, thanks. I would like to make sure here we do not have issues.

@azziko
Copy link
Copy Markdown
Author

azziko commented May 2, 2026

thanks for the review @mgaido91,

I pushed the quick fixes for most of the points, I will add some unit tests later too.

Regarding the EOS, I replied in the related conversation.

This code relies on recent contribs to NeMo (thanks for them as well!), but currently we have in our dependencies nemo_toolkit[asr]==2.4.0 for canary. I think we have to update that.

It does not seem like the contributions have been added to any release yet. I'm using latest commit from the repo when installing nemo toolkit as so:

pip install "nemo_toolkit[asr] @ git+https://github.com/NVIDIA/NeMo.git"

Copy link
Copy Markdown
Contributor

@mgaido91 mgaido91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly LGTM, thanks, just a few minor comments. The main thing that worries me is the EOS stripping, which I would like to investigate more.

Regarding the version, the next release will be 2.8.0. So we can put that as a dependency. This might also mean we have to wait for that release to merge this but it may be fine if they stick with their scheduled release (June, so ~1 month from now). Otherwise we can put "@ git+https://github.com/NVIDIA/NeMo.git@main" as a dependency in the pyproject (actually it would be better to use a commit hash than main, to ensure we do not have falky issues with newer commits coming in). Then we will need another PR once they do the release to use that.

- **audio_subsampling_factor (int)**: Subsampling factor of the model, if any.
Defaults to 1.
- **mel_hop_samples (int)**: Number of raw waveform samples per mel frame.
Defaults to 1.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Defaults to 1.
Defaults to 160, i.e. 10ms at 16kHz.

Comment thread config/canary_streamatt.yaml
Comment on lines +53 to +55
self.use_raw_audio_history = True
self.mel_hop_samples = getattr(self.config, "mel_hop_samples", 160)
self.audio_subsampling_factor = getattr(self.config, "audio_subsampling_factor", 8)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these things are already set in the parent, no need to have them here.

Comment thread simulstream/server/speech_processors/canary_streamatt.py

return replace(self.transcription_cfg, prompt={"turns": turns})

def _remove_eos_tokens(self, token_ids: List[int]) -> List[int]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have a repro, I can also try to debug this, thanks. I would like to make sure here we do not have issues.

@azziko
Copy link
Copy Markdown
Author

azziko commented May 5, 2026

I agree on the version, I changed it to 2.8.0

Regarding the EOS problem, I looked into the logs I had, it was the problem with our system in particular, so I removed the EOS trimming in the latest commit. It's still probably a good idea to run the processor on some small test set. I will try it out when I have time.

Comment thread uts/speech_processors/test_streamatt.py Outdated
Copy link
Copy Markdown
Contributor

@mgaido91 mgaido91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, only one comment regarding the UT. I agree on testing this more thoroughly, I'll also do that when I find the time.

Since we have to wait for nemo 2.8.0 to be out, please ping me if I do not notice it, so when nemo 2.8.0 is out we merge this.

Thanks!

Comment thread simulstream/server/speech_processors/canary_streamatt.py Outdated
azziko and others added 3 commits May 6, 2026 10:36
Co-authored-by: Marco Gaido <marcogaido91@gmail.com>
Co-authored-by: Marco Gaido <marcogaido91@gmail.com>
Comment thread uts/speech_processors/test_streamatt.py Outdated
Comment thread simulstream/server/speech_processors/canary_streamatt.py Outdated
Comment thread simulstream/server/speech_processors/canary_streamatt.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Canary-v2 streamatt speech processor

2 participants