Prism's speech backends on some platforms #12
Replies: 25 comments
-
|
Hi there, Nah, thank you for opening the discussion and asking your questions! There is no such thing as a stupid question, so feel free to ask away! To answer yours:
Yes, this is intended. UIA is technically not a TTS/screen reader backend in the typical sense. The UIA backend uses UI Automation to communicate with the screen reader, which effectively makes it screen reader agnostic: it works on anything that can speak UIA, including Narrator. The downside with UIA (and the reason the other backends are always preferred over it) is that when using them, speaking through them is much cheaper and makes it possible to communicate with your users even when your app is not focused. UIA is a lot more expensive because it needs to instantiate multiple COM objects, create a separate worker thread and window, etc. UIA however is pretty good and is used by things like OSARA.
This is very concerning to me. It may be that once I merge PR #8 this issue will get resolved. I haven't tested ZDSR all that much (I don't have a license for it) so I wasn't aware it was dysfunctional, and I apologize.
Yeah, someone else privately reported this to me, and I'm not honestly sure why it happens, and I didn't fix it because that person provided little detail and I wasn't sure if it was their code or Prism. I may have used the wrong values when converting the ranges to [0.0, 1.0], although what makes it incredibly annoying is that Microsoft is extremely vague and contradictory as to the actual valid values, so I had to improvise. All they say about it is:
But, of course, they then go on to say:
To say this made the OneCore backend implementation frustrating was quite the understatement. So fixing this may be one of those long-term "let's just tinker with it" bug-fix projects, since Microsoft's own words are contradictory.
Yeah, SAPI has a lot of thread locking stuff that goes on internally to make it reasonably thread-safe. I hoped that this (wouldn't) cause any issues, but I may need to revisit it. Can you provide more detail as to when the hang happens? Or is it random? A small test program would also be appreciated.
Linux has only Orca and Speech Dispatcher backends available. If you use |
Beta Was this translation helpful? Give feedback.
-
|
Hi. Thanks so much for your answers. I personally made it so that UIA is filtered and excluded from speech backends, because it would be confusing for people, but thank you for your explanation. For SAPI, it feels like the program is trying to wait for the speech to start to process the next event. Like I said it doesn't happen to me with any other backend, so that's strange to me. Projects that implement SAPI support, such as NVGT, I think do not have this thread locking problem. It is not actually random for SAPI. It's almost consistant, but the interesting part is, the program does not hang indefinitely when SAPI is speaking. It only happens when it trys to speak the next phrase.
Unfortunately I haven't done any much testing on Linux, but I will do a small program that logs the available backends, and trys each of them to confirm this. |
Beta Was this translation helpful? Give feedback.
-
I will definitely look into this because, although SAPI is written to be thread-safe, it should definitely not be causing this level of lock contention. I've already initiated the 0.11.2 release workflow, so we can incrementally test these changes, so once that's out can you see if most of your issues have been resolved? I will dig into SAPI and see what I can do about it. |
Beta Was this translation helpful? Give feedback.
-
|
@diamondStar35 I may have a fix for SAPI, but I would like to test it with your setup first. Can you provide me some kind of MRE project I can play with? |
Beta Was this translation helpful? Give feedback.
-
I use it with one of my projects, called Top Speed. It is a racing audio game written in DotNet and uses this library. Unfortunately I do not have a minimal program yet, but if you would like me to do that I could try to do it. In the mean time, if you want to test how SAPI behaves in my project, you can grab it from the following link: https://github.com/diamondStar35/top_speed/releases/download/release-build/TopSpeed-windows-x64-Release-v-2026.4.9.3.zip |
Beta Was this translation helpful? Give feedback.
-
|
Also speaking of Linux and Prism. It seems the Linux built has to be revisited because, I've identified the issue. I've created a minimal program that uses Prism, enumerates available backends, logs all supported backends and trys to use each of them. However, the program couldn't be even started because of the following: 2026-04-09 18:24:39.556 UTC 2026-04-09 18:24:39.607 UTC 2026-04-09 18:24:39.609 UTC 2026-04-09 18:24:39.610 UTC 2026-04-09 18:24:39.611 UTC 2026-04-09 18:24:39.645 UTC 2026-04-09 18:24:39.717 UTC at TopSpeed.Speech.Prism.LinuxMethods.prism_config_init() |
Beta Was this translation helpful? Give feedback.
-
|
@diamondStar35 The error on why Prism isn't working is right there: your glib version is too old (or glib is somehow not installed). You need GCC 13 or later. I could maybe try to get Prism building against an older version of Ubuntu (right now it uses 24.04) but that may be a problem (especially for ARM builds). |
Beta Was this translation helpful? Give feedback.
-
Yes, I am aware of that, and that's why I've shared it. It requires a newer version of Ubuntu, but this is a problem for many people, especially because people still use Ubuntu 22.0 or similar. |
Beta Was this translation helpful? Give feedback.
-
|
@diamondStar35 I'm not sure about lowering the Ubuntu version. Ubuntu 22.04 has GCC 4:11.2.0-1ubuntu1 while 24.04 has 4:13.2.0-7ubuntu1, and Prism requires C++ 23 to compile. |
Beta Was this translation helpful? Give feedback.
-
|
But you can actually install GCC-13 on Ubuntu 22.0 if that is required if I'm not wrong. Also, I am trying to compile it on an actual Ubuntu 22.0 machine and will share the details soon if it works. |
Beta Was this translation helpful? Give feedback.
-
|
@ethindp cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_C_COMPILER=gcc-13 -DCMAKE_CXX_COMPILER=g++-13 cmake --build build -j 2026-04-09 19:27:16.461 UTC 2026-04-09 19:27:16.567 UTC 2026-04-09 19:27:16.569 UTC 2026-04-09 19:27:16.572 UTC 2026-04-09 19:27:16.572 UTC 2026-04-09 19:27:16.572 UTC 2026-04-09 19:27:16.579 UTC 2026-04-09 19:27:16.580 UTC 2026-04-09 19:27:16.580 UTC 2026-04-09 19:27:16.581 UTC 2026-04-09 19:27:16.581 UTC 2026-04-09 19:27:16.581 UTC 2026-04-09 19:27:16.585 UTC 2026-04-09 19:27:16.585 UTC The same happens with Speech Dispatcher. Prism exception does not give any details on why it failed to speak. |
Beta Was this translation helpful? Give feedback.
-
|
@diamondStar35 So, acquire_best returns an already initialized backend, and create/create_best does not. The docs explicitly call this out and you should treat an already initialized error as a sane condition: it does not indicate an actual problem, and backend initialization is done in such a way that it can only be done once unless the backend is explicitly freed first. With respect to your other two problems, Orca and Speech Dispatcher both require that the respective engine be installed and running. It's odd that Orca is initializing because it shouldn't be if Orca is not running or does not provide the respective D-Bus service (which in 22.04 it would not). Do you check the As for speech dispatcher, this will take some looking into, as I pretty much completely defer to libspeechd for the actual handling and the backend is quite primitive in terms of actual work. |
Beta Was this translation helpful? Give feedback.
-
|
@ethindp |
Beta Was this translation helpful? Give feedback.
-
|
@diamondStar35 Can you try with the latest commit? I think I've fixed the libspeechd backend. If it still dies (i.e., libspeechd says nothing), this very well could be a bug with your setup. As for Orca, I've tried to make it more rigorous with respect to actually detecting that this is Orca we're talking to. But without a full XML parser (which I am hesitant to pull in) and performing D-bus introspection, something can always "fake" the interface and Prism would be none the wiser. |
Beta Was this translation helpful? Give feedback.
-
|
@ethindp So another question. Does that mean that using Prism on Ubuntu 22.0 with Orca is not possible? |
Beta Was this translation helpful? Give feedback.
-
Then do not use it. If that bit is clear, it explicitly signals that whatever component is needed for the backend to even function is not present. That bit is the only dynamic bit of that entire bitfield. It does not guarantee the backend will even initialize, but it determines whether it is even available. |
Beta Was this translation helpful? Give feedback.
-
If you do not Have Orca 49 or laTer (I think 49 is the earliest that actually implements it), then no. Whether it is or is not available depends on your Orca version. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your help. The version of Orca in my distrobution was too old and I can't even update it. It was 42, so I had to delete the intire distrobution and try on Ubuntu 24.0. So this issue will probably be resolved by using a later version of Ubuntu. I hope the issue of SAPI could be looked into though. Oh my bad sorry. I'll try with the latest commit and SAPI. |
Beta Was this translation helpful? Give feedback.
-
|
I believe I have already solved SAPI in the latest commit. It does not seem to be produceable in my tests anymore. It may still lag slightly when navigating certain menus for example, but that is more down to the SAPI speech engine used than SAPI itself, and if you speak to memory and play that you will get silence trimming. As is usual, use get features for backends to figure out what you are and aren't allowed to do for any given backend. |
Beta Was this translation helpful? Give feedback.
-
Yes sorry my bad. I've edited my last comment when I saw the description of the latest commit. Apologies. Speaking of SpeakToMemory, why does the normal speaking produce silence while speaking to memory (As you said) does not produce that silence? You mentioned that when speaking to memory you get silence trimming. |
Beta Was this translation helpful? Give feedback.
-
|
@diamondStar35 Speak to memory gets leading/trailing silence because it is what the speech engine produces and what is sent to the audio device. SAPI does all that internally and Prism doesn't control it. I could make Prism do all of that and get silence trimming, but then I would also need to pull in Miniaudio and a bunch of other things that would dramatically complicate the (already complicated) SAPI backend, and a back-of-the-math cost-benefit analysis from my perspective is that the cost of actually implementing that would outweigh any benefit that may exist. |
Beta Was this translation helpful? Give feedback.
-
|
@ethindp Thanks for your explanation. While I've tried to look for the latest release because I do not want to build it on Windows, since I don't have the proper tools for that, the latest release doesn't include your latest commit. Isn't this using workflows for generating release assets? |
Beta Was this translation helpful? Give feedback.
-
|
It is, yes, but I manually initiate it due to PyPi. I didn't want to publish a new release until we could confirm that the issues were solved. |
Beta Was this translation helpful? Give feedback.
-
|
Alright thanks so much for your help. I will be able to test it on my end with SAPI when the new release is published, either sooner or later, but I'm not worrying about it if it's already fixed, so feel free to delay it if you feel necessary. |
Beta Was this translation helpful? Give feedback.
-
|
It has been published. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello.
First of all, I'm so sorry for this ambiguous title, I just couldn't really find something better to use as a title for the issues or the questions I have.
I've started using Prism recently on DotNet, and it's absolutely awesome! Especially because it's cross-platform and supports a lot of backends.
One note before describing my issues, I am not using Prismatoid wrapper for one reason. The wrapper only targets DotNet 10, and my project targets both DotNet 10 and DotNet 4.7.2, so I had to create a minimal wrapper that has the things I need.
My experience with Prism on Windows is awesome. There are just a few minor issues and questions here.
On Linux, the situation is more interesting. I mentioned before that my project by default trys to acquire the best available backend, but with that setup, people also get no speech on Linux. I don't know what is the reason for that or if there's something to be done on Linux. I haven't tried changing the backend though because I don't even know what other backends are available, and because there is no speech.
I'm just putting it here so if anyone has faced a similar issue or something needs to be done.
I apologize if this is too long. I appreciate any help with this.
Regards.
Beta Was this translation helpful? Give feedback.
All reactions