Skip to content

feat: add inline timestamps STT output format#364

Open
brittain9 wants to merge 1 commit intomkiol:mainfrom
brittain9:feature/inline-timestamps
Open

feat: add inline timestamps STT output format#364
brittain9 wants to merge 1 commit intomkiol:mainfrom
brittain9:feature/inline-timestamps

Conversation

@brittain9
Copy link

Summary

Adds a new Inline Timestamps text format option for speech-to-text output, allowing timestamps to be embedded directly within transcribed text as an alternative to SRT subtitle format.

Closes #222

Screenshots

timestamps timestamp2

Note: the TTS output at the bottom that strips the current timestamp template.

Motivation

When transcribing audio (podcasts, meetings, interviews), I want timestamps inline with text for:

  • Easier human readability vs. SRT's rigid block format
  • LLM post-processing (summarization, Q&A) where inline context is more natural
  • Quick reference without the overhead of subtitle parsing

Changes

New Settings (Settings → Speech to Text)

Setting Description
Text format New "Inline Timestamps" option
Timestamp template Customizable format with {hh}, {mm}, {ss}, {ms}, {text} tokens
Minimum interval Prevents timestamp spam

Example output: [00:05] Hello world [00:12] This is a test

Implementation

  • text_tools.cpp: Core functions for formatting, regex compilation, and stripping timestamps
  • STT Engines: Integrated into Vosk, Whisper, FasterWhisper, April, and DeepSpeech
  • TTS Integration: Auto-strips inline timestamps before speaking (seamless text-to-speech of transcribed content)
  • Bug fix: Corrected text format resetting to "Plain Text" when clearing notepad

Tests

  • Unit tests covering format_segments_inline, compile_inline_timestamp_regex, strip_inline_timestamps
  • Edge cases: interval state across batches, complex template formats, whitespace handling

Testing Done

  • Unit tests pass (text_tools_test)
  • Manual testing with Vosk, April, Whisper engine
  • TTS strips timestamps before speaking
  • Settings UI functional with presets and custom templates

- Implement configurable timestamp templates ({hh}, {mm}, {ss}, {text})
- Support for all STT engines
- Auto-strip timestamps during TTS playback
- Fix bug when clearing text reset the format to Plain Text
@mkiol
Copy link
Owner

mkiol commented Jan 20, 2026

Sorry for late reply. It looks fantastic :)

I'm a bit busy at the moment and need a few more days to look at the code and test it. Thank you for your understanding.

Copy link
Owner

@mkiol mkiol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your great work. Love it! I am impressed that you managed to understand the entire wiring. I like your programming style and learned a lot during the review :)

PR is almost ready for merging. The only obstacle is options that are transferred between "app" and "service". The "service" should not read settings that can be changed by the user (the only exception is settings that require a restart). For more info on this problem, see the comments.

Comment on lines +917 to +920
std::string_view text_view = seg.text;
if (!text_view.empty()) {
if (text_view.front() == ' ') text_view.remove_prefix(1);
os << ' ' << text_view;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice use of string_view 👍🏿

Comment on lines +1294 to +1297
config.inline_timestamp_template =
settings::instance()->inline_timestamp_template().toStdString();
config.inline_timestamp_min_interval =
settings::instance()->inline_timestamp_min_interval();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works, but there is a reason why the config options are not taken directly from the settings object, but are instead passed from dsnote_app to stt_engine via "options" map. The problem is that Speech Note is split into two separate processes under SFOS, which communicate via DBus. dsnote_app has a UI and manages the settings, while speech_service is a daemon. When the settings change in "app" process, the "service" process does not detect this. To work around this problem, the settings are passed (via DBus) together with the request type in the "options" map object.

My suggestions:

  • Move inline_timestamp_template and inline_timestamp_min_interval to config.sub_config (or create a new structure).
  • Retrieve values from the options object in stt_sub_config_from_options (or in a new helper function).
  • Add inline_timestamp_template and inline_timestamp_min_interval to the options obj in dsnote_app::transcribe_file and dsnote_app::listen_internal.
  • Add inline_timestamp_template to the options in dsnote_app::play_speech_internal and dsnote_app::speech_to_file_internal.
  • Do not call update_inline_timestamp_regex when the setting changes, but instead call it in speech_service::tts_play_speech and speech_service::tts_speech_to_file based on options obj.

Comment on lines +348 to +351
connect(
settings::instance(), &settings::inline_timestamp_template_changed,
this, &speech_service::update_inline_timestamp_regex);
update_inline_timestamp_regex();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timestamp regex is only used to remove inline timestamps for TTS. Am I correct?

Do not call update_inline_timestamp_regex when the setting changes, but instead call it in speech_service::tts_play_speech and speech_service::tts_speech_to_file based on options obj. I added more on this in the comment below.

@@ -78,19 +78,19 @@
<context>
<name>AddTextDialog</name>
<message>
<location filename="../sfos/qml/AddTextDialog.qml" line="33"/>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are not translating anything, please remove all changes to the TS files from this PR. I will update these files before releasing the new version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add timestamps to speech to text from audio file

2 participants