Skip to content

feat: add kokoro-onnx as lightweight TTS backend#261

Open
DiarmuidKelly wants to merge 3 commits intombailey:masterfrom
DiarmuidKelly:feat/kokoro-onnx-backend
Open

feat: add kokoro-onnx as lightweight TTS backend#261
DiarmuidKelly wants to merge 3 commits intombailey:masterfrom
DiarmuidKelly:feat/kokoro-onnx-backend

Conversation

@DiarmuidKelly
Copy link
Copy Markdown

@DiarmuidKelly DiarmuidKelly commented Feb 13, 2026

Summary

Closes #262

  • Add kokoro-onnx as alternative TTS backend using ONNX Runtime
  • 4.9x faster time-to-first-audio vs PyTorch (1.6s vs 7.8s)
  • 88MB int8 model vs 310MB PyTorch model
  • Uses all CPU cores vs single-core PyTorch inference

Changes

  • New voice_mode/services/kokoro_onnx/ package with FastAPI server
  • Add voicemode service install kokoro-onnx command
  • Add installer with automatic dependency and model download
  • 32 unit tests for server, installer, and config
  • Documentation with real-world benchmarks

Configuration

# Install and start
voicemode service install kokoro-onnx
voicemode service start kokoro-onnx

# Use as primary TTS (port 8881)
VOICEMODE_TTS_BASE_URLS=http://127.0.0.1:8881/v1,http://127.0.0.1:8880/v1

Test plan

  • Unit tests pass (32 tests)
  • Manual testing with Claude Code voice mode
  • Verified TTFA improvement vs PyTorch backend

@DiarmuidKelly
Copy link
Copy Markdown
Author

======================================================================================================================== tests coverage =========================================================================================================================
_______________________________________________________________________________________________________ coverage: platform linux, python 3.10.19-final-0 ________________________________________________________________________________________________________

Name                                                Stmts   Miss Branch BrPart   Cover   Missing
------------------------------------------------------------------------------------------------
voice_mode/__main__.py                                  1      1      0      0   0.00%   5
voice_mode/audio_player.py                             81     42     30      6  42.34%   53-92, 111, 117, 141->exit, 144-147, 160, 163->169, 170, 174-186
voice_mode/auth.py                                    252     70     60      4  68.59%   138, 142-191, 228, 235->237, 237->exit, 284-286, 326-328, 458-459, 514-592, 612
voice_mode/cli.py                                    1638   1036    478     31  31.05%   37->52, 67-71, 75, 77, 82-83, 88, 134, 149-151, 166-168, 183-185, 210-223, 242-244, 263-265, 284-286, 301-335, 356-405, 418, 435-438, 444-447, 453-456, 462-465, 471-474, 480-483, 491-494, 500-521, 534-561, 571-598, 606, 612-614, 620-622, 628-630, 636-638, 644-646, 652-654, 662-664, 670-691, 705-750, 760-787, 803-804, 810-811, 817-818, 824-825, 831-832, 838-839, 847-848, 854-855, 869-870, 881-882, 1367-1369, 1377-1407, 1416-1418, 1435-1484, 1507-1551, 1559, 1565-1567, 1573-1575, 1581-1583, 1589-1630, 1641, 1717-1848, 1946->1948, 1948->1950, 1950->1952, 1954, 1956, 1958, 1962->1964, 1965, 1967, 1969, 1990, 2005->2007, 2008, 2010, 2012, 2015, 2018->2022, 2023, 2062, 2065->2069, 2084-2088, 2095-2118, 2131-2295, 2312-2403, 2422, 2427-2434, 2439-2441, 2451-2484, 2502-2512, 2525-2558, 2565-2572, 2579-2585, 2592-2598, 2605-2617, 2624-2636, 2653-2664, 2681, 2699-2734, 2751-2782, 2801-2812, 2832-2856, 2871, 2889-2912, 2925-2945, 2961-2986, 3000-3043, 3103, 3130, 3148-3151, 3174->exit, 3272-3273, 3288-3325, 3329-3339, 3351, 3356-3450, 3453-3459, 3521-3522, 3531-3573
voice_mode/cli_commands/agent.py                      332     76    150      9  75.31%   106-115, 127, 135-136, 189->192, 192->195, 195->198, 198->201, 224->217, 289-290, 313-314, 344-345, 430-431, 485-486, 503-504, 758-801, 826-866, 892-897
voice_mode/cli_commands/claude.py                     193      9     80      6  93.04%   54-55, 125->124, 158-159, 178, 184, 189->192, 390-392
voice_mode/cli_commands/exchanges.py                  283    218    144      0  15.22%   26, 47-87, 107-142, 169-224, 241-337, 356-437
voice_mode/cli_commands/history.py                     99     75     26      0  19.20%   16, 47-79, 118-170, 183-210, 220-236
voice_mode/cli_commands/status.py                     394    331    144      0  11.71%   75-98, 104-140, 145-164, 169-178, 183-186, 191-289, 300-369, 380-381, 392-405, 413-418, 427-445, 507-624, 629-693, 699-700, 728-745
voice_mode/cli_commands/transcribe.py                  45     20      8      0  47.17%   13, 57-101, 134
voice_mode/conch.py                                   125     20     30      8  81.94%   141->147, 144-145, 162, 171, 178-187, 204-205, 212-213, 240, 247->256, 249->256, 254, 273-274
voice_mode/config.py                                  455    140    150     35  66.78%   39, 52-53, 76-369, 373->372, 386->417, 397, 402->411, 550, 563-565, 570-582, 696, 708-709, 716-717, 769-814, 818-822, 827-829, 848, 851->855, 881-891, 899-902, 909->914, 911-912, 928->989, 939, 949, 955-961, 962->941, 966, 989->exit, 993-1002, 1016-1017, 1022-1036, 1050-1080, 1097->1099, 1103->1111, 1115-1116, 1121-1125, 1157, 1215-1216, 1224->1223, 1229-1231, 1280, 1284->1288, 1334->1333
voice_mode/connect_registry.py                        198     81     48      5  61.79%   86-89, 136-139, 147, 151-241, 245-253, 276, 300, 309-313
voice_mode/conversation_logger.py                      91     16     18      8  77.98%   57-61, 74-77, 82, 98-101, 105-107, 150->154, 164->exit, 174-176
voice_mode/core.py                                    363    244     94      5  29.76%   49-68, 91, 108-130, 186-543, 573->586, 578-579, 583-584, 659-661, 693-695, 733, 769->768, 772-773
voice_mode/dj/chapters.py                              66      2     30      3  94.79%   82->62, 88->62, 94->62, 167-168
voice_mode/dj/controller.py                            97     31     36      2  66.17%   49, 74-118, 126-131, 195, 200->204
voice_mode/dj/library.py                              192      9     34      4  93.36%   91, 234->244, 243, 269-270, 283, 322-323, 328-329
voice_mode/dj/mfp.py                                  286     56     84     20  77.30%   45, 71-75, 139, 146-159, 192, 217, 226->214, 271, 307->310, 369, 375-376, 391, 460, 464-465, 482-484, 492-498, 506-511, 546, 549->544, 585-586, 591-592, 609, 643-644, 647->651, 654-658
voice_mode/dj/player.py                               101     34      6      0  62.62%   35, 43, 60, 71-112, 116-117
voice_mode/exchanges/conversations.py                 103     91     54      0   7.64%   24, 38-66, 85-112, 126-156, 167-185, 202-256
voice_mode/exchanges/filters.py                        77     55     22      0  22.22%   17, 28-34, 47-58, 69-75, 86-92, 103-109, 120-126, 137-139, 150-154, 166-171, 179-181, 189-193, 204-216, 228-233, 244-247, 255-256, 260
voice_mode/exchanges/formatters.py                    144    124     54      0  10.10%   44-99, 113-175, 188, 201-230, 235, 248-259, 271-355
voice_mode/exchanges/models.py                        115     48     24      0  48.20%   37, 41, 60-75, 89-107, 111, 116, 121, 126, 131-142, 157, 162, 167, 172, 176-187, 191
voice_mode/exchanges/reader.py                        114     92     46      0  13.75%   29-33, 37-41, 52-58, 70-79, 93-101, 113-146, 157-161, 172-184, 195-213, 222-225, 237-259
voice_mode/exchanges/stats.py                         199    179    108      0   6.51%   22-26, 34-58, 62-93, 97-139, 147-155, 163-176, 187-195, 203-211, 219-226, 234-240, 249-292, 300-317, 333-371, 379-425
voice_mode/history/database.py                         53     39      2      0  25.45%   18-25, 29-105, 133-159, 170-173, 182-190, 198-200, 204, 208, 212
voice_mode/history/loader.py                           71     55     18      0  17.98%   26-27, 42-44, 55-60, 80-118, 129-146, 157-177
voice_mode/history/search.py                           69     55     14      0  16.87%   21-30, 41-63, 82, 102-136, 147-160, 176-200
voice_mode/openai_error_parser.py                      91     17     56      9  78.23%   118->120, 120->122, 122->129, 125-126, 134->136, 136->139, 152-163, 165-168, 173, 185
voice_mode/prompts/converse.py                          4      1      0      0  75.00%   9
voice_mode/prompts/release_notes.py                    37      1     22      2  94.92%   20, 53->32
voice_mode/prompts/services.py                         19      3      8      3  77.78%   18, 38, 41
voice_mode/pronounce.py                               129     32     42      5  72.51%   35-37, 42, 48-50, 87-91, 119, 146->144, 168-169, 189, 209, 223-239, 243-248, 252-253
voice_mode/provider_discovery.py                      155     76     68      6  47.09%   30, 42, 44, 89, 118-128, 139-210, 222-241, 253->251, 260, 264-267, 271-274, 278, 306->exit
voice_mode/providers.py                               138     75     60      8  41.92%   50-70, 78->80, 87->108, 90, 92->87, 113->109, 116, 135-157, 178-215, 221-230, 257-276, 282-292, 303-320
voice_mode/resources/audio_files.py                    29     20     12      0  21.95%   16-31, 43-53
voice_mode/resources/configuration.py                 162    145     26      0   9.04%   34-37, 56-135, 151-170, 186-205, 210-230, 247-328, 342-404
voice_mode/resources/docs_resources.py                 33     20     10      0  30.23%   13-16, 22-25, 31-34, 40-43, 49-52
voice_mode/resources/statistics.py                     41     30      4      0  24.44%   24-101, 117-154, 170-178
voice_mode/resources/version.py                        28     21      8      0  19.44%   16-47
voice_mode/resources/whisper_models.py                 31     23     12      0  18.60%   26-79
voice_mode/serve_middleware.py                        125     30     36      5  75.78%   100, 149, 161-192, 244-245, 329-330, 351, 426-427
voice_mode/server.py                                   46     36     10      1  19.64%   15-19, 38-95
voice_mode/services/kokoro_onnx/installer.py          120     55     40      3  52.50%   32-62, 105-107, 174-203, 215, 218-225, 230-233
voice_mode/services/kokoro_onnx/server.py              94     23     14      3  72.22%   73-84, 163-187, 194-196, 201-207
voice_mode/shared.py                                   51     38     14      0  20.00%   37-83, 88-103
voice_mode/simple_failover.py                         134      8     42     11  89.20%   115, 121->47, 131->137, 133->137, 192->199, 194-196, 202, 236, 275->281, 277->281, 305->307, 314-315
voice_mode/statistics.py                              195    106     66      4  36.40%   82-89, 108, 114, 119->118, 121->118, 124-125, 166-246, 250-251, 255-257, 261-262, 270-345, 354
voice_mode/statistics_tracking.py                       9      2      0      0  77.78%   40-41
voice_mode/streaming.py                               284    284     78      0   0.00%   8-567
voice_mode/tools/__init__.py                           89     14     40      7  83.72%   12->15, 96, 111, 145-150, 166-171, 183
voice_mode/tools/configuration_management.py          227     66     98     13  69.54%   46->74, 60->70, 89-90, 134->144, 138->144, 153->159, 169, 197->201, 206-210, 213-217, 219->226, 252, 257-258, 360-389, 404-446
voice_mode/tools/converse.py                          905    378    368     84  56.17%   23-25, 121-133, 145-150, 163-172, 186-189, 194-195, 275-311, 319-344, 367, 370->373, 401->406, 452->456, 460, 464-470, 574-576, 601-602, 622-623, 654, 657->661, 662, 671->exit, 676-677, 685-691, 715-718, 731-762, 779-780, 786, 788, 790, 866-873, 877-888, 907-909, 930-935, 943, 957, 967-968, 971-972, 978, 988-990, 994, 998-999, 1007-1058, 1063, 1065, 1067-1073, 1150, 1152, 1154, 1156, 1158, 1162-1167, 1203-1204, 1211, 1219-1226, 1251-1254, 1261, 1269-1271, 1279->1331, 1284-1321, 1325, 1400-1401, 1416->1418, 1418->1420, 1428-1436, 1443->1447, 1448->1459, 1450->1452, 1452->1454, 1454->1456, 1473, 1497, 1508, 1528-1529, 1536-1545, 1550, 1562-1565, 1579-1587, 1598->1614, 1606->1614, 1610-1611, 1615-1708, 1712-1764, 1768-1784, 1790->1792, 1792->1794, 1818-1819, 1838->1840, 1840->1843, 1844, 1853->1855, 1873, 1878-1888, 1895, 1898-1906, 1912, 1918-1944, 1951-1956, 1960, 1971-1979
voice_mode/tools/dependencies.py                       80     14     34      7  81.58%   54-55, 91-92, 108-109, 112, 113->118, 116, 149-150, 158, 165-167
voice_mode/tools/devices.py                           152     86     34      2  43.01%   43-45, 74-75, 89-94, 119-120, 124-126, 139-249
voice_mode/tools/diagnostics.py                        39      2      6      0  95.56%   52-53
voice_mode/tools/providers.py                          86     52     34      4  35.00%   39-41, 54-57, 62-73, 84-88, 96-98, 111-156
voice_mode/tools/service.py                           653    380    318     57  39.55%   56, 82-84, 90-92, 107, 120-147, 160-187, 199-226, 238-247, 260, 265, 315-357, 362-373, 378, 382-384, 387, 407, 411, 420->427, 425, 435, 437, 441-448, 453, 455, 457-508, 511->515, 523-525, 532-534, 540-545, 557-559, 563-590, 592->616, 597-613, 625, 630-687, 693, 707-708, 712-715, 720-724, 731, 736-741, 753-765, 767->784, 772-781, 785, 798-801, 805-807, 839, 844-879, 898-899, 919-924, 953-987, 1003, 1005, 1026->1033, 1029->1033, 1033->1042, 1036->1042, 1037->1039, 1045, 1061-1065, 1102-1106, 1129-1161, 1166-1191
voice_mode/tools/sound_fonts/audio_player.py           34     34     18      0   0.00%   8-87
voice_mode/tools/sound_fonts/hook_handler.py           48     48     20      0   0.00%   8-127
voice_mode/tools/sound_fonts/player.py                 90     90     30      0   0.00%   8-177
voice_mode/tools/statistics.py                        117    117     38      0   0.00%   3-219
voice_mode/tools/transcription/__init__.py              3      3      0      0   0.00%   3-6
voice_mode/tools/transcription/backends.py             85     85     26      0   0.00%   3-287
voice_mode/tools/transcription/core.py                 35     35     14      0   0.00%   3-129
voice_mode/tools/transcription/formats.py              78     78     34      0   0.00%   3-144
voice_mode/tools/transcription/types.py                35     35      0      0   0.00%   3-52
voice_mode/tools/voice_registry.py                     31      2     12      4  86.05%   41, 43->33, 57, 59->50
voice_mode/tools/whisper/install.py                   332    309    126      0   5.02%   46-307, 345-773
voice_mode/tools/whisper/list_models.py                17     14      2      0  15.79%   22-66
voice_mode/tools/whisper/model_active.py               12      9      6      0  16.67%   21-50
voice_mode/tools/whisper/model_benchmark.py            69     66     50      0   2.52%   28-149
voice_mode/tools/whisper/model_install.py              82     68     42      0  11.29%   66-211
voice_mode/tools/whisper/model_remove.py               10      7      4      0  21.43%   22-36
voice_mode/tools/whisper/models.py                    175    111     62      9  31.65%   101->106, 104, 116, 124, 145-156, 164, 177-185, 190-194, 199-200, 205-206, 234-266, 269->276, 271, 290-320, 338-415, 424-429, 434-439, 444-449, 454-459, 464-469
voice_mode/tools/whisper/uninstall.py                 102     89     38      0   9.29%   39-199
voice_mode/utils/audio_diagnostics.py                  94     35     30     10  58.87%   21, 25->30, 39, 54-55, 74-92, 95-100, 103-107, 134, 137-138, 154->160, 157, 160->173, 178-179
voice_mode/utils/dependencies/cache.py                 15      1      2      0  94.12%   47
voice_mode/utils/dependencies/checker.py              143    112     46      3  20.11%   30-49, 69-76, 84-85, 98-134, 150-186, 191-200, 218-264
voice_mode/utils/dependencies/package_managers.py      74     36     10      0  50.00%   46-48, 51-60, 78-80, 84-93, 111-113, 117-126
voice_mode/utils/download.py                          104     92     48      0   7.89%   15-21, 26-30, 53-168, 183-184
voice_mode/utils/event_logger.py                      201    136     56      0  25.29%   32, 89-114, 124-142, 154-163, 172-185, 189-234, 238-239, 243-251, 256-268, 272-280, 306-308, 314-316, 325-327, 332-334, 339-346, 351-353, 358-360, 365-373, 378-380
voice_mode/utils/ffmpeg_check.py                       68      3     26      5  91.49%   44->51, 47->51, 84->83, 87-88, 152, 178->184
voice_mode/utils/format_migration.py                   38     38     18      0   0.00%   8-82
voice_mode/utils/gpu_detection.py                      52     27     16      4  42.65%   25-44, 54->61, 70-99, 118-119
voice_mode/utils/migration_helpers.py                  82     73     44      0   7.14%   14-35, 40-56, 66-131
voice_mode/utils/services/common.py                    58     49     22      0  11.25%   16-43, 51-58, 66-75, 95-110
voice_mode/utils/services/kokoro_helpers.py            41     28     22      1  25.40%   24-43, 61-90
voice_mode/utils/services/whisper_helpers.py          191    174     82      0   6.23%   22-40, 45-80, 101-183, 206-280, 304-305, 323-530, 540
voice_mode/utils/services/whisper_version.py           52     39     28      2  18.75%   29->32, 35-124, 136-138
voice_mode/utils/symlinks.py                           45      2     10      0  96.36%   113-114
voice_mode/utils/version_helpers.py                    89     77     36      0   9.60%   14-37, 47-88, 93-97, 102-140, 145-165, 170-184
voice_mode/version.py                                  40      7     10      2  82.00%   27-28, 41-43, 54->75, 71-73
voice_mode/whisper_model_unified.py                   124     47     44     12  60.12%   11-12, 74, 76, 104, 116-120, 131-153, 157, 166->170, 175-198, 211->223, 214, 226-228
------------------------------------------------------------------------------------------------
TOTAL                                               13400   7463   4640    432  40.64%

28 files skipped due to complete coverage.
Coverage HTML written to dir htmlcov
Coverage XML written to file coverage.xml
==================================================================================================================== short test summary info ====================================================================================================================
SKIPPED [1] tests/test_conversation_browser_playback.py:96: Flask not installed (install with 'pip install voice-mode[scripts]')
SKIPPED [1] tests/test_diagnostics.py:249: get_voice_mode_version doesn't exist in current implementation
SKIPPED [1] tests/test_installers.py:89: Skipping installer tests to prevent killing running voice services. Stop services before running these tests.
SKIPPED [1] tests/test_installers.py:109: Skipping installer tests to prevent killing running voice services. Stop services before running these tests.
SKIPPED [1] tests/test_installers.py:128: Skipping installer tests to prevent killing running voice services. Stop services before running these tests.
SKIPPED [1] tests/test_installers.py:146: Skipping installer tests to prevent killing running voice services. Stop services before running these tests.
SKIPPED [1] tests/test_installers.py:169: Skipping installer tests to prevent killing running voice services. Stop services before running these tests.
SKIPPED [1] tests/test_installers.py:190: Skipping installer tests to prevent killing running voice services. Stop services before running these tests.
SKIPPED [1] tests/test_installers.py:224: Skipping installer tests to prevent killing running voice services. Stop services before running these tests.
SKIPPED [1] tests/test_installers.py:241: Skipping installer tests to prevent killing running voice services. Stop services before running these tests.
SKIPPED [1] tests/test_installers.py:282: Skipping installer tests to prevent killing running voice services. Stop services before running these tests.
SKIPPED [1] tests/test_installers.py:308: Skipping installer tests to prevent killing running voice services. Stop services before running these tests.
SKIPPED [1] tests/test_installers.py:326: Skipping installer tests to prevent killing running voice services. Stop services before running these tests.
SKIPPED [1] tests/test_installers.py:341: Skipping installer tests to prevent killing running voice services. Stop services before running these tests.
SKIPPED [1] tests/test_installers.py:356: Skipping installer tests to prevent killing running voice services. Stop services before running these tests.
SKIPPED [1] tests/test_installers.py:378: Skipping installer tests to prevent killing running voice services. Stop services before running these tests.
SKIPPED [1] tests/test_installers.py:396: Skipping installer tests to prevent killing running voice services. Stop services before running these tests.
SKIPPED [1] tests/test_installers.py:414: Skipping installer tests to prevent killing running voice services. Stop services before running these tests.
SKIPPED [1] tests/test_installers.py:440: Skipping installer tests to prevent killing running voice services. Stop services before running these tests.
SKIPPED [1] tests/test_installers.py:462: Skipping installer tests to prevent killing running voice services. Stop services before running these tests.
SKIPPED [1] tests/test_installers.py:484: Skipping installer tests to prevent killing running voice services. Stop services before running these tests.
SKIPPED [1] tests/test_installers.py:522: Integration tests disabled by default. Set RUN_INTEGRATION_TESTS=1 to enable.
SKIPPED [1] tests/test_installers.py:537: Integration tests disabled by default. Set RUN_INTEGRATION_TESTS=1 to enable.
SKIPPED [1] tests/test_makefile_portability.py:37: macOS-specific test
SKIPPED [1] tests/test_selective_tool_loading.py:155: Flaky test - passes individually but fails in full suite due to environment pollution from other tests
SKIPPED [1] tests/test_silence_detection.py:71: Mock sounddevice.rec() causing test to hang
SKIPPED [1] tests/test_silence_detection.py:90: Mock sounddevice.rec() causing test to hang
SKIPPED [1] tests/test_silence_detection.py:108: Mock sounddevice.rec() causing test to hang
SKIPPED [1] tests/test_silence_detection.py:158: Mock sounddevice.rec() causing test to hang
SKIPPED [1] tests/test_silence_detection.py:241: Test requires real audio device interaction
SKIPPED [1] tests/test_tts_error_handling.py:61: Test needs refactoring - local services may be available that don't require API key
SKIPPED [1] tests/test_tts_stability.py:65: Need to refactor for lazy imports
SKIPPED [1] tests/test_tts_stability.py:127: httpx.Timeout API changed
SKIPPED [1] tests/test_tts_stability.py:155: Complex async context manager mocking - error handling tested elsewhere
SKIPPED [1] tests/test_tts_stability.py:163: Need to refactor for lazy imports
SKIPPED [1] tests/test_tts_stability.py:244: Missing fixture - need to refactor
SKIPPED [1] tests/test_tts_stability.py:297: Requires real OPENAI_API_KEY for integration test
SKIPPED [1] tests/test_voice_mode.py:125: Complex mocking of script-based MCP server not yet implemented
SKIPPED [1] tests/test_voice_mode.py:132: Complex mocking of script-based MCP server not yet implemented
SKIPPED [1] tests/test_voice_mode.py:139: Complex mocking of script-based MCP server not yet implemented
SKIPPED [1] tests/test_voice_mode.py:153: Complex mocking of script-based MCP server not yet implemented
SKIPPED [1] tests/test_voice_mode.py:168: Complex mocking of script-based MCP server not yet implemented
SKIPPED [1] tests/test_voice_mode.py:178: Complex mocking of script-based MCP server not yet implemented
SKIPPED [1] tests/test_voice_mode.py:188: Complex mocking of script-based MCP server not yet implemented
SKIPPED [1] tests/test_voice_mode.py:199: Module import issues with script format
SKIPPED [1] tests/test_voice_mode.py:258: Module import issues with script format
SKIPPED [1] tests/test_voice_mode.py:313: Complex mocking of script-based MCP server not yet implemented
SKIPPED [1] tests/test_voice_mode.py:343: Complex mocking of script-based MCP server not yet implemented
SKIPPED [1] tests/test_voice_mode.py:369: Complex mocking of script-based MCP server not yet implemented
SKIPPED [1] tests/test_voice_mode.py:396: Complex mocking of script-based MCP server not yet implemented
SKIPPED [1] tests/test_whisper_model_cli.py:68: Test needs refactoring after services directory removal
SKIPPED [1] tests/test_whisper_model_cli.py:81: Test needs refactoring after services directory removal
SKIPPED [1] tests/test_whisper_model_cli.py:94: Test needs refactoring after services directory removal
SKIPPED [1] tests/test_whisper_model_cli.py:112: Remove command has been removed from CLI
SKIPPED [1] tests/test_whisper_model_cli.py:117: Remove command has been removed from CLI
SKIPPED [1] tests/test_whisper_model_cli.py:122: Remove command has been removed from CLI
===================================================================================================== 876 passed, 56 skipped, 1 warning in 76.06s (0:01:16) =====================================================================================================
Tests completed!

Add kokoro-onnx service as an alternative to PyTorch-based Kokoro:

- New service on port 8881 with OpenAI-compatible /v1/audio/speech endpoint
- ONNX Runtime parallelises across all CPU cores (vs single-core PyTorch)
- int8 quantised model uses 88MB vs 310MB full model
- Lower memory footprint, faster CPU inference

Files added:
- voice_mode/services/kokoro_onnx/server.py - FastAPI server
- voice_mode/templates/scripts/start-kokoro-onnx.sh - Service script
- tests/test_kokoro_onnx.py - 18 unit tests

Usage:
  voicemode service start kokoro-onnx
  # Configure: VOICEMODE_TTS_BASE_URLS=http://127.0.0.1:8881/v1
- Add voicemode service install kokoro-onnx command
- Create installer.py with automatic dep install and model download
- Add kokoro-onnx health check support in CLI
- Update docs with automatic installation option
- Add inference time comparison (PyTorch ~30s+ vs ONNX <10s)
- Add GPU acceleration notes (CUDA, DirectML, ROCm)
- Add installer tests
- Add voicemode service install kokoro-onnx command
- Create installer.py with automatic dep install and model download
- Add comprehensive unit tests for installer (32 tests total)
- Pin kokoro-onnx to ~=0.5.0 (compatible release)
- Add real TTFA benchmarks: ONNX 1.6s vs PyTorch 7.8s
- Add GPU acceleration notes (CUDA, DirectML, MIGraphX)
@DiarmuidKelly DiarmuidKelly force-pushed the feat/kokoro-onnx-backend branch from efd3384 to 2b67539 Compare February 17, 2026 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add kokoro-onnx as lightweight TTS backend

1 participant