feat: add inline timestamps STT output format#364
feat: add inline timestamps STT output format#364brittain9 wants to merge 1 commit intomkiol:mainfrom
Conversation
- Implement configurable timestamp templates ({hh}, {mm}, {ss}, {text})
- Support for all STT engines
- Auto-strip timestamps during TTS playback
- Fix bug when clearing text reset the format to Plain Text
|
Sorry for late reply. It looks fantastic :) I'm a bit busy at the moment and need a few more days to look at the code and test it. Thank you for your understanding. |
mkiol
left a comment
There was a problem hiding this comment.
Thank you very much for your great work. Love it! I am impressed that you managed to understand the entire wiring. I like your programming style and learned a lot during the review :)
PR is almost ready for merging. The only obstacle is options that are transferred between "app" and "service". The "service" should not read settings that can be changed by the user (the only exception is settings that require a restart). For more info on this problem, see the comments.
| std::string_view text_view = seg.text; | ||
| if (!text_view.empty()) { | ||
| if (text_view.front() == ' ') text_view.remove_prefix(1); | ||
| os << ' ' << text_view; |
| config.inline_timestamp_template = | ||
| settings::instance()->inline_timestamp_template().toStdString(); | ||
| config.inline_timestamp_min_interval = | ||
| settings::instance()->inline_timestamp_min_interval(); |
There was a problem hiding this comment.
It works, but there is a reason why the config options are not taken directly from the settings object, but are instead passed from dsnote_app to stt_engine via "options" map. The problem is that Speech Note is split into two separate processes under SFOS, which communicate via DBus. dsnote_app has a UI and manages the settings, while speech_service is a daemon. When the settings change in "app" process, the "service" process does not detect this. To work around this problem, the settings are passed (via DBus) together with the request type in the "options" map object.
My suggestions:
- Move
inline_timestamp_templateandinline_timestamp_min_intervaltoconfig.sub_config(or create a new structure). - Retrieve values from the options object in
stt_sub_config_from_options(or in a new helper function). - Add
inline_timestamp_templateandinline_timestamp_min_intervalto the options obj indsnote_app::transcribe_fileanddsnote_app::listen_internal. - Add
inline_timestamp_templateto the options indsnote_app::play_speech_internalanddsnote_app::speech_to_file_internal. - Do not call
update_inline_timestamp_regexwhen the setting changes, but instead call it inspeech_service::tts_play_speechandspeech_service::tts_speech_to_filebased onoptionsobj.
| connect( | ||
| settings::instance(), &settings::inline_timestamp_template_changed, | ||
| this, &speech_service::update_inline_timestamp_regex); | ||
| update_inline_timestamp_regex(); |
There was a problem hiding this comment.
Timestamp regex is only used to remove inline timestamps for TTS. Am I correct?
Do not call update_inline_timestamp_regex when the setting changes, but instead call it in speech_service::tts_play_speech and speech_service::tts_speech_to_file based on options obj. I added more on this in the comment below.
| @@ -78,19 +78,19 @@ | |||
| <context> | |||
| <name>AddTextDialog</name> | |||
| <message> | |||
| <location filename="../sfos/qml/AddTextDialog.qml" line="33"/> | |||
There was a problem hiding this comment.
If you are not translating anything, please remove all changes to the TS files from this PR. I will update these files before releasing the new version.
Summary
Adds a new Inline Timestamps text format option for speech-to-text output, allowing timestamps to be embedded directly within transcribed text as an alternative to SRT subtitle format.
Closes #222
Screenshots
Note: the TTS output at the bottom that strips the current timestamp template.
Motivation
When transcribing audio (podcasts, meetings, interviews), I want timestamps inline with text for:
Changes
New Settings (Settings → Speech to Text)
{hh},{mm},{ss},{ms},{text}tokensExample output:
[00:05] Hello world [00:12] This is a testImplementation
text_tools.cpp: Core functions for formatting, regex compilation, and stripping timestampsTests
format_segments_inline,compile_inline_timestamp_regex,strip_inline_timestampsTesting Done
text_tools_test)