feat: audio ingestion via Whisper API by RaviTharuma · Pull Request #31 · AVIDS2/memorix

RaviTharuma · 2026-03-29T22:26:59Z

Summary

Adds an memorix_ingest_audio MCP tool that transcribes audio files using the Whisper API and stores the transcript as a memorix observation.

What's included

src/multimodal/audio-loader.ts — Whisper API client with:
- Configurable model (MEMORIX_WHISPER_MODEL, default: whisper-1)
- Language hint and prompt support
- Chunked file reading for large files
- Automatic format detection from extension (mp3, wav, m4a, ogg, flac, webm, mp4)
src/multimodal/index.ts — shared multimodal utilities (MIME detection, file validation)
src/server.ts — MCP tool registration (memorix_ingest_audio)
tests/multimodal/audio-loader.test.ts — 8 tests covering:
- Transcript extraction and observation storage
- Unsupported format rejection
- Missing file handling
- Language and prompt passthrough
- API error propagation

Design decisions

LLM-optional: requires MEMORIX_WHISPER_API_KEY (falls back to OPENAI_API_KEY). Tool returns clear error if unconfigured.
Stores transcript as how-it-works observation type with audio metadata in facts.
No new npm dependencies — uses native fetch for API calls and fs for file reading.

Tests

8 pass, 0 fail

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6e1369a556

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-29T22:31:31Z

src/server.ts

+        markInternalWrite();
+        const result = await ingestAudio(


Move internal-write marker to the actual write point

markInternalWrite() is called before ingestAudio(), but audio ingestion performs a remote transcription that can run up to 120 seconds; the hot-reload skip window is only 10 seconds, so long transcriptions can expire the skip window before storeObservation writes. In that case the file watcher will treat this internal write as external and trigger unnecessary reload/reindex work, which can cause avoidable contention during ingest-heavy workflows.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-29T22:31:31Z

tests/multimodal/audio-loader.test.ts

@@ -0,0 +1,173 @@
+import { describe, it, expect, afterEach, beforeEach } from 'bun:test';


Use Vitest APIs in the new test file

The repository’s test command runs Vitest and includes tests/**/*.test.ts, but this new test imports from bun:test. Under the declared test runner, that import is unresolved, so CI/local npm test will fail before executing these assertions. Switching this file to Vitest imports (or globals) is necessary to keep the test suite runnable.

Useful? React with 👍 / 👎.

…e vitest imports

AVIDS2 · 2026-03-30T13:36:19Z

Thanks for pushing on multimodal ingestion — the capability is useful, but this one still has a couple of concrete blockers.

The test file uses bun:test, while this repo runs vitest in CI. Right now that means the PR fails immediately in the standard test matrix.
In src/server.ts, the object passed into storeObservation is inferred as type: string, but storeObservation expects a real ObservationType.
More importantly, the config path is too loose for the implementation: transcribeAudio() pulls the generic Memorix LLM API key via getLLMApiKey(), but the code then talks specifically to OpenAI/Groq Whisper transcription endpoints. In a valid Memorix setup that is configured for Anthropic or another non-Whisper provider, this can end up sending the wrong credential/provider combination to an incompatible API.

So I don't think this is just a test-fix PR — it needs one more pass on provider/config semantics as well.

RaviTharuma · 2026-04-08T17:07:57Z

Thanks for the detailed breakdown — all three points are valid.

bun:test → vitest: Will switch the test file imports to vitest globals.
ObservationType inference: Will add an explicit cast / assertion so the object flowing into storeObservation satisfies the ObservationType constraint rather than widening to string.
Provider/config semantics: This is the substantive one. The right fix is to add a Whisper-specific config path (e.g. MEMORIX_WHISPER_API_KEY / MEMORIX_WHISPER_BASE_URL) that is separate from the general LLM key, so audio ingestion doesn't assume the general LLM provider is Whisper-compatible. I'll update transcribeAudio() to use those dedicated config values and document them in the README section for audio ingestion.

…hisper config Three changes: 1. storeFn parameter type: type: string -> type: ObservationType so the callback signature is compatible with storeObservation() without a cast. 2. Dedicated Whisper key resolution: transcribeAudio() now checks MEMORIX_WHISPER_API_KEY first, then falls back to OPENAI_API_KEY. Does NOT fall through to MEMORIX_LLM_API_KEY — that key may point to Anthropic or another non-Whisper provider, which would send the wrong credential to the /audio/transcriptions endpoint. 3. MEMORIX_WHISPER_BASE_URL: optional override for self-hosted or alternative Whisper-compatible endpoints (vLLM, LocalAI, etc.). Tests: added coverage for MEMORIX_WHISPER_API_KEY priority and MEMORIX_WHISPER_BASE_URL custom endpoint; cleanup ensures both vars are cleared in afterEach.

AVIDS2

The test/type issues look improved, but there is still a concrete config/runtime mismatch before this is safe to merge: the PR advertises provider: groq, yet the Whisper credential resolver only falls back to MEMORIX_WHISPER_API_KEY or OPENAI_API_KEY. In the normal provider=groq case, a user with only GROQ_API_KEY configured will still fail, and a user with only OPENAI_API_KEY configured will send the wrong credential to the Groq endpoint. Please either add provider-correct key resolution for Groq, or narrow the feature so it no longer claims Groq preset support until that path is actually wired correctly.

feat: audio ingestion via Whisper API (memorix-ek1)

6e1369a

chatgpt-codex-connector bot reviewed Mar 29, 2026

View reviewed changes

fix: address PR review — move markInternalWrite to write callback, us…

c57784c

…e vitest imports

AVIDS2 requested changes Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: audio ingestion via Whisper API#31

feat: audio ingestion via Whisper API#31
RaviTharuma wants to merge 3 commits intoAVIDS2:mainfrom
RaviTharuma:feature/memorix-ek1-audio-ingestion

RaviTharuma commented Mar 29, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 29, 2026

Uh oh!

chatgpt-codex-connector bot Mar 29, 2026

Uh oh!

AVIDS2 commented Mar 30, 2026

Uh oh!

RaviTharuma commented Apr 8, 2026

Uh oh!

AVIDS2 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,173 @@
		import { describe, it, expect, afterEach, beforeEach } from 'bun:test';

Conversation

RaviTharuma commented Mar 29, 2026

Summary

What's included

Design decisions

Tests

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

AVIDS2 commented Mar 30, 2026

Uh oh!

RaviTharuma commented Apr 8, 2026

Uh oh!

AVIDS2 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants