Canary streamatt by azziko · Pull Request #34 · hlt-mt/simulstream

azziko · 2026-04-29T07:33:09Z

Changes:

Add flag to the base streamatt, which determines whether the audio history is stored raw or in features
Implement canary with streamatt

Resolves: #28

azziko · 2026-04-29T07:38:34Z

I'll fix the checks and run unit tests. I forgot about them to be honest

Let me know if the overall idea is fine

mgaido91

thank you very much for your contribution @azziko ! The approach looks great to me and the code is very clean, thanks. I amonly concerned by the leading EOS, which I do not understand.

Only a couple of last points:

Can we please add a couple of unit tests for the audio history management? Only to ensure everything works like we expect and also future changes won''t break things.
This code relies on recent contribs to NeMo (thanks for them as well!), but currently we have in our dependencies nemo_toolkit[asr]==2.4.0 for canary. I think we have to update that.

Thanks!

mgaido91 · 2026-04-30T15:45:06Z

+
+        return replace(self.transcription_cfg, prompt={"turns": turns})
+
+    def _remove_eos_tokens(self, token_ids: List[int]) -> List[int]:


I do not understand, how and when can this happen? isn't it a problem for the attention to have these extra tokens?

When we were testing our system with Canary for IWSLT, there were EOS tokens occasionally in the beginning of the hypothesis. While we haven't traced the exact reason why, I speculate it's because of the forced prefix. In our system we solved it this way. The fix should probably be better done on the NeMo side, though. I will look into that

isn't it a problem for the attention to have these extra tokens?

In our tests they were outputted together with the other prefiction, so I assume again that they don't distrupt the attention scores.

if you have a repro, I can also try to debug this, thanks. I would like to make sure here we do not have issues.

azziko · 2026-05-02T14:09:45Z

thanks for the review @mgaido91,

I pushed the quick fixes for most of the points, I will add some unit tests later too.

Regarding the EOS, I replied in the related conversation.

This code relies on recent contribs to NeMo (thanks for them as well!), but currently we have in our dependencies nemo_toolkit[asr]==2.4.0 for canary. I think we have to update that.

It does not seem like the contributions have been added to any release yet. I'm using latest commit from the repo when installing nemo toolkit as so:

pip install "nemo_toolkit[asr] @ git+https://github.com/NVIDIA/NeMo.git"

mgaido91

mostly LGTM, thanks, just a few minor comments. The main thing that worries me is the EOS stripping, which I would like to investigate more.

Regarding the version, the next release will be 2.8.0. So we can put that as a dependency. This might also mean we have to wait for that release to merge this but it may be fine if they stick with their scheduled release (June, so ~1 month from now). Otherwise we can put "@ git+https://github.com/NVIDIA/NeMo.git@main" as a dependency in the pyproject (actually it would be better to use a commit hash than main, to ensure we do not have falky issues with newer commits coming in). Then we will need another PR once they do the release to use that.

mgaido91 · 2026-05-04T07:48:09Z

           - **audio_subsampling_factor (int)**: Subsampling factor of the model, if any.
             Defaults to 1.
+           - **mel_hop_samples (int)**: Number of raw waveform samples per mel frame.
+             Defaults to 1.


Suggested change

Defaults to 1.

Defaults to 160, i.e. 10ms at 16kHz.

mgaido91 · 2026-05-04T07:53:25Z

+        self.use_raw_audio_history = True
+        self.mel_hop_samples = getattr(self.config, "mel_hop_samples", 160)
+        self.audio_subsampling_factor = getattr(self.config, "audio_subsampling_factor", 8)


all these things are already set in the parent, no need to have them here.

mgaido91 · 2026-05-04T08:00:16Z

+
+        return replace(self.transcription_cfg, prompt={"turns": turns})
+
+    def _remove_eos_tokens(self, token_ids: List[int]) -> List[int]:


if you have a repro, I can also try to debug this, thanks. I would like to make sure here we do not have issues.

azziko · 2026-05-05T21:15:04Z

I agree on the version, I changed it to 2.8.0

Regarding the EOS problem, I looked into the logs I had, it was the problem with our system in particular, so I removed the EOS trimming in the latest commit. It's still probably a good idea to run the processor on some small test set. I will try it out when I have time.

mgaido91

LGTM, only one comment regarding the UT. I agree on testing this more thoroughly, I'll also do that when I find the time.

Since we have to wait for nemo 2.8.0 to be out, please ping me if I do not notice it, so when nemo 2.8.0 is out we merge this.

Thanks!

Co-authored-by: Marco Gaido <marcogaido91@gmail.com>

azziko added 2 commits April 29, 2026 07:30

Add canary streamatt

15b6a00

Add audio history type flag to the base streamatt

4279681

Add stylistic fixes addressing the linter

53afb67

mgaido91 reviewed Apr 30, 2026

View reviewed changes

azziko added 2 commits May 2, 2026 13:56

Add minor fixes

b9ec9ba

Fix linter issues

056ec4e

mgaido91 reviewed May 4, 2026

View reviewed changes

azziko added 4 commits May 4, 2026 14:07

Add minor fixes

39f1380

Delete removing eos in the beginning

3f9a6eb

Add unit test for audio trimming in update history

6b23ddf

Change the canary dependency version

52803f4

mgaido91 reviewed May 6, 2026

View reviewed changes

Comment thread uts/speech_processors/test_streamatt.py Outdated

mgaido91 reviewed May 6, 2026

View reviewed changes

Comment thread simulstream/server/speech_processors/canary_streamatt.py Outdated

azziko and others added 3 commits May 6, 2026 10:36

Update simulstream/server/speech_processors/canary_streamatt.py

cbc895e

Co-authored-by: Marco Gaido <marcogaido91@gmail.com>

Update uts/speech_processors/test_streamatt.py

59bf6f3

Co-authored-by: Marco Gaido <marcogaido91@gmail.com>

Fix linter

076ed37

mgaido91 reviewed May 6, 2026

View reviewed changes

Comment thread uts/speech_processors/test_streamatt.py Outdated

mgaido91 reviewed May 6, 2026

View reviewed changes

Comment thread simulstream/server/speech_processors/canary_streamatt.py Outdated

Comment thread simulstream/server/speech_processors/canary_streamatt.py Outdated

Add minor fixes

a1fea18


		return replace(self.transcription_cfg, prompt={"turns": turns})

		def _remove_eos_tokens(self, token_ids: List[int]) -> List[int]:

Conversation

azziko commented Apr 29, 2026

Uh oh!

azziko commented Apr 29, 2026

Uh oh!

mgaido91 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mgaido91 Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

azziko May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mgaido91 May 4, 2026

Choose a reason for hiding this comment

Uh oh!

azziko commented May 2, 2026

Uh oh!

mgaido91 left a comment

Choose a reason for hiding this comment

Uh oh!

mgaido91 May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mgaido91 May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mgaido91 May 4, 2026

Choose a reason for hiding this comment

Uh oh!

azziko commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mgaido91 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

azziko May 2, 2026 •

edited

Loading

azziko commented May 5, 2026 •

edited

Loading