feat: audio ingestion via Whisper API#31
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6e1369a556
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/server.ts
Outdated
| markInternalWrite(); | ||
| const result = await ingestAudio( |
There was a problem hiding this comment.
Move internal-write marker to the actual write point
markInternalWrite() is called before ingestAudio(), but audio ingestion performs a remote transcription that can run up to 120 seconds; the hot-reload skip window is only 10 seconds, so long transcriptions can expire the skip window before storeObservation writes. In that case the file watcher will treat this internal write as external and trigger unnecessary reload/reindex work, which can cause avoidable contention during ingest-heavy workflows.
Useful? React with 👍 / 👎.
| @@ -0,0 +1,173 @@ | |||
| import { describe, it, expect, afterEach, beforeEach } from 'bun:test'; | |||
There was a problem hiding this comment.
Use Vitest APIs in the new test file
The repository’s test command runs Vitest and includes tests/**/*.test.ts, but this new test imports from bun:test. Under the declared test runner, that import is unresolved, so CI/local npm test will fail before executing these assertions. Switching this file to Vitest imports (or globals) is necessary to keep the test suite runnable.
Useful? React with 👍 / 👎.
|
Thanks for pushing on multimodal ingestion — the capability is useful, but this one still has a couple of concrete blockers.
So I don't think this is just a test-fix PR — it needs one more pass on provider/config semantics as well. |
|
Thanks for the detailed breakdown — all three points are valid.
|
…hisper config Three changes: 1. storeFn parameter type: type: string -> type: ObservationType so the callback signature is compatible with storeObservation() without a cast. 2. Dedicated Whisper key resolution: transcribeAudio() now checks MEMORIX_WHISPER_API_KEY first, then falls back to OPENAI_API_KEY. Does NOT fall through to MEMORIX_LLM_API_KEY — that key may point to Anthropic or another non-Whisper provider, which would send the wrong credential to the /audio/transcriptions endpoint. 3. MEMORIX_WHISPER_BASE_URL: optional override for self-hosted or alternative Whisper-compatible endpoints (vLLM, LocalAI, etc.). Tests: added coverage for MEMORIX_WHISPER_API_KEY priority and MEMORIX_WHISPER_BASE_URL custom endpoint; cleanup ensures both vars are cleared in afterEach.
AVIDS2
left a comment
There was a problem hiding this comment.
The test/type issues look improved, but there is still a concrete config/runtime mismatch before this is safe to merge: the PR advertises provider: groq, yet the Whisper credential resolver only falls back to MEMORIX_WHISPER_API_KEY or OPENAI_API_KEY. In the normal provider=groq case, a user with only GROQ_API_KEY configured will still fail, and a user with only OPENAI_API_KEY configured will send the wrong credential to the Groq endpoint. Please either add provider-correct key resolution for Groq, or narrow the feature so it no longer claims Groq preset support until that path is actually wired correctly.
Summary
Adds an
memorix_ingest_audioMCP tool that transcribes audio files using the Whisper API and stores the transcript as a memorix observation.What's included
src/multimodal/audio-loader.ts— Whisper API client with:MEMORIX_WHISPER_MODEL, default:whisper-1)src/multimodal/index.ts— shared multimodal utilities (MIME detection, file validation)src/server.ts— MCP tool registration (memorix_ingest_audio)tests/multimodal/audio-loader.test.ts— 8 tests covering:Design decisions
MEMORIX_WHISPER_API_KEY(falls back toOPENAI_API_KEY). Tool returns clear error if unconfigured.how-it-worksobservation type with audio metadata in facts.fetchfor API calls andfsfor file reading.Tests