CopilotKit · jpr5 · May 18, 2026 · May 18, 2026 · May 18, 2026 · May 18, 2026
diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json
@@ -9,7 +9,7 @@
       "source": {
         "source": "npm",
         "package": "@copilotkit/aimock",
-        "version": "^1.24.0"
+        "version": "^1.25.0"
       },
       "description": "Fixture authoring skill for @copilotkit/aimock — LLM, multimedia (image/TTS/transcription/video), MCP, A2A, AG-UI, vector, embeddings, structured output, sequential responses, streaming physics, record/replay, agent loop patterns, and debugging"
     }

diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json
@@ -1,6 +1,6 @@
 {
   "name": "aimock",
-  "version": "1.24.0",
+  "version": "1.25.0",
   "description": "Fixture authoring guidance for @copilotkit/aimock — LLM, multimedia, MCP, A2A, AG-UI, vector, and service mocking",
   "author": {
     "name": "CopilotKit"

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,31 @@
 
 ### Added
 
+- **Gemini `embedContent` endpoint** — `POST /v1beta/models/{model}:embedContent`
+  with deterministic fallback embeddings and fixture matching
+- **`/v1/images/edit` and `/v1/images/variations` endpoints** — multipart
+  form-data, same response format as generations. Closes #221
+- **`/v1/audio/translations` endpoint** — reuses transcription handler with
+  `endpoint: "translation"` and `task: "translate"` in verbose mode
+- **Ollama `/api/embeddings` endpoint** — single-embedding response, supports
+  both `prompt` and `input` (string or array) fields
+- **Cohere `/v2/embed` endpoint** — multi-text embedding with configurable
+  `embedding_types` (float, int8, etc.)
+- **ElevenLabs `/v1/text-to-speech/{voice_id}` endpoint** — binary audio
+  response with voice routing and `onElevenLabsTTS` helper
+- **Streaming usage chunks** — when `stream_options.include_usage` is set,
+  emits a final SSE chunk with token usage before `[DONE]`
+- **Automatic token usage estimation** — responses without explicit fixture
+  `usage` overrides now return estimated token counts (~4 chars/token)
+  instead of zeros
+- **Rate limiting headers on 429 responses** — `Retry-After`,
+  `x-ratelimit-limit-*`, `x-ratelimit-remaining-*`,
+  `x-ratelimit-reset-*` headers on all error fixtures with status 429.
+  Custom `retryAfter` override via fixture field
+- **`onTranslation` convenience method** — register translation fixtures
+  with endpoint discrimination
+- **`onElevenLabsTTS` convenience method** — register ElevenLabs TTS
+  fixtures
 - **Configurable proxy timeouts** — `RecordConfig` now accepts `upstreamTimeoutMs` (default 30s) and `bodyTimeoutMs` (default 30s). The body-idle timeout is the Node socket inactivity timer that fires `req.destroy()` mid-stream; under concurrent load against reasoning models (e.g. Grok 4.3 + structured output), token-emission gaps can routinely exceed 30s during the thinking phase, causing record-mode runs to truncate SSE responses mid-stream with no `[DONE]` and no `finish_reason`. Lift to e.g. `bodyTimeoutMs: 180_000` to record cleanly under that workload.
 
 ## [1.24.1] - 2026-05-14

diff --git a/DRIFT.md b/DRIFT.md
@@ -77,7 +77,13 @@ When a `critical` drift is detected:
    - OpenAI Responses API → `src/responses.ts` (`buildTextResponse`, `buildToolCallResponse`, `buildTextStreamEvents`, `buildToolCallStreamEvents`)
    - Anthropic Claude → `src/messages.ts` (`buildClaudeTextResponse`, `buildClaudeToolCallResponse`, `buildClaudeTextStreamEvents`, `buildClaudeToolCallStreamEvents`)
    - Google Gemini → `src/gemini.ts` (`buildGeminiTextResponse`, `buildGeminiToolCallResponse`, `buildGeminiTextStreamChunks`, `buildGeminiToolCallStreamChunks`)
+   - Gemini embedContent → `src/gemini.ts` (embedContent response builder)
    - Gemini Interactions → `src/gemini-interactions.ts` (`buildInteractionsTextResponse`, `buildInteractionsToolCallResponse`, `buildInteractionsTextSSEEvents`, `buildInteractionsToolCallSSEEvents`)
+   - OpenAI Image Edit → `src/images.ts` (multipart `/v1/images/edit` handler)
+   - OpenAI Audio Translation → `src/transcription.ts` (multipart `/v1/audio/translations` handler)
+   - Ollama Embeddings → `src/ollama.ts` (`/api/embeddings` response builder)
+   - Cohere Embed → `src/cohere.ts` (`/v2/embed` response builder)
+   - ElevenLabs TTS → `src/elevenlabs-audio.ts` (`/v1/text-to-speech/{voice_id}` response builder)
 
 2. **Update the builder** — add or modify the field to match the real API shape.
 
@@ -107,7 +113,22 @@ When a model is deprecated:
 
 ## WebSocket Drift Coverage
 
-In addition to the 23 existing drift tests (20 HTTP response-shape + 3 model deprecation), WebSocket drift tests cover aimock's WS protocols (6 verified + 2 canary = 8 WS tests):
+In addition to the 23 existing drift tests (20 HTTP response-shape + 3 model deprecation), the following new endpoint coverage has been added:
+
+### New Endpoint Drift Coverage
+
+| Endpoint                                 | Provider      | Type              | Status  |
+| ---------------------------------------- | ------------- | ----------------- | ------- |
+| POST /v1beta/models/{model}:embedContent | Gemini        | HTTP              | Covered |
+| POST /v1/images/edit                     | OpenAI        | HTTP (multipart)  | Covered |
+| POST /v1/audio/translations              | OpenAI        | HTTP (multipart)  | Covered |
+| POST /api/embeddings                     | Ollama        | HTTP              | Covered |
+| POST /v2/embed                           | Cohere        | HTTP              | Covered |
+| POST /v1/text-to-speech/{voice_id}       | ElevenLabs    | HTTP              | Covered |
+| stream_options.include_usage             | OpenAI        | Streaming feature | Covered |
+| x-ratelimit-\* / Retry-After 429         | All providers | Response headers  | Covered |
+
+WebSocket drift tests cover aimock's WS protocols (6 verified + 2 canary = 8 WS tests):
 
 ### Gemini Interactions API (Beta)
 

diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 https://github.com/user-attachments/assets/76815122-574a-48e1-b275-edae0a014667
 
-Mock infrastructure for AI application testing — LLM APIs, image generation, text-to-speech, transcription, audio generation, video generation, MCP tools, A2A agents, AG-UI event streams, vector databases, search, rerank, and moderation. One package, one port, zero dependencies.
+Mock infrastructure for AI application testing — LLM APIs, image generation, image editing, text-to-speech, transcription, audio translation, audio generation, video generation, embeddings, MCP tools, A2A agents, AG-UI event streams, vector databases, search, rerank, and moderation. One package, one port, zero dependencies.
 
 ## Quick Start
 
@@ -35,23 +35,23 @@ await mock.stop();
 
 aimock mocks everything your AI app talks to:
 
-| Tool           | What it mocks                                                                                                                | Docs                                                |
-| -------------- | ---------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- |
-| **LLMock**     | OpenAI (Chat/Responses/Realtime GA+Beta), Claude, Gemini (REST/Live/Interactions), Bedrock, Azure, Vertex AI, Ollama, Cohere | [Providers](https://aimock.copilotkit.dev/docs)     |
-| **MCPMock**    | MCP tools, resources, prompts with session management                                                                        | [MCP](https://aimock.copilotkit.dev/mcp-mock)       |
-| **A2AMock**    | Agent-to-agent protocol with SSE streaming                                                                                   | [A2A](https://aimock.copilotkit.dev/a2a-mock)       |
-| **AGUIMock**   | AG-UI agent-to-UI event streams for frontend testing                                                                         | [AG-UI](https://aimock.copilotkit.dev/agui-mock)    |
-| **VectorMock** | Pinecone, Qdrant, ChromaDB compatible endpoints                                                                              | [Vector](https://aimock.copilotkit.dev/vector-mock) |
-| **Services**   | Tavily search, Cohere rerank, OpenAI moderation                                                                              | [Services](https://aimock.copilotkit.dev/services)  |
+| Tool           | What it mocks                                                                                                                                                                          | Docs                                                |
+| -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------- |
+| **LLMock**     | OpenAI (Chat/Responses/Realtime GA+Beta), Claude, Gemini (REST/Live/Interactions/Embeddings), Bedrock, Azure, Vertex AI, Ollama (chat/embeddings), Cohere (chat/embed), ElevenLabs TTS | [Providers](https://aimock.copilotkit.dev/docs)     |
+| **MCPMock**    | MCP tools, resources, prompts with session management                                                                                                                                  | [MCP](https://aimock.copilotkit.dev/mcp-mock)       |
+| **A2AMock**    | Agent-to-agent protocol with SSE streaming                                                                                                                                             | [A2A](https://aimock.copilotkit.dev/a2a-mock)       |
+| **AGUIMock**   | AG-UI agent-to-UI event streams for frontend testing                                                                                                                                   | [AG-UI](https://aimock.copilotkit.dev/agui-mock)    |
+| **VectorMock** | Pinecone, Qdrant, ChromaDB compatible endpoints                                                                                                                                        | [Vector](https://aimock.copilotkit.dev/vector-mock) |
+| **Services**   | Tavily search, Cohere rerank, OpenAI moderation, ElevenLabs TTS                                                                                                                        | [Services](https://aimock.copilotkit.dev/services)  |
 
 Run them all on one port with `npx @copilotkit/aimock --config aimock.json`, or use the programmatic API to compose exactly what you need.
 
 ## Features
 
 - **[Record & Replay](https://aimock.copilotkit.dev/record-replay)** — Proxy real APIs, save as fixtures, replay deterministically forever
 - **[Multi-turn Conversations](https://aimock.copilotkit.dev/multi-turn)** — Record and replay multi-turn traces with tool rounds; match distinct turns via `turnIndex`, `hasToolResult`, `toolCallId`, `sequenceIndex`, `systemMessage` (gate on host-supplied agent context), or custom predicates
-- **[12 LLM Providers](https://aimock.copilotkit.dev/docs)** — OpenAI Chat, OpenAI Responses, OpenAI Realtime (GA + Beta shim), Claude, Gemini, Gemini Live, Gemini Interactions, Azure, Bedrock, Vertex AI, Ollama, Cohere — full streaming support
-- **Multimedia APIs** — [image generation](https://aimock.copilotkit.dev/images) (DALL-E, Imagen), [text-to-speech](https://aimock.copilotkit.dev/speech), [audio transcription](https://aimock.copilotkit.dev/transcription), [video generation](https://aimock.copilotkit.dev/video), [fal.ai](https://aimock.copilotkit.dev/fal-ai) (image / video / audio with queue lifecycle)
+- **[14 LLM Providers](https://aimock.copilotkit.dev/docs)** — OpenAI Chat, OpenAI Responses, OpenAI Realtime (GA + Beta shim), Claude, Gemini (REST + embedContent), Gemini Live, Gemini Interactions, Azure, Bedrock, Vertex AI, Ollama (chat + embeddings), Cohere (chat + embed), ElevenLabs TTS — full streaming support
+- **Multimedia APIs** — [image generation](https://aimock.copilotkit.dev/images) (DALL-E, Imagen), [image editing](https://aimock.copilotkit.dev/images) (/v1/images/edit), [text-to-speech](https://aimock.copilotkit.dev/speech) (OpenAI + ElevenLabs), [audio transcription](https://aimock.copilotkit.dev/transcription), [audio translation](https://aimock.copilotkit.dev/transcription) (/v1/audio/translations), [video generation](https://aimock.copilotkit.dev/video), [fal.ai](https://aimock.copilotkit.dev/fal-ai) (image / video / audio with queue lifecycle)
 - **[MCP](https://aimock.copilotkit.dev/mcp-mock) / [A2A](https://aimock.copilotkit.dev/a2a-mock) / [AG-UI](https://aimock.copilotkit.dev/agui-mock) / [Vector](https://aimock.copilotkit.dev/vector-mock)** — Mock every protocol your AI agents use
 - **[Chaos Testing](https://aimock.copilotkit.dev/chaos-testing)** — 500 errors, malformed JSON, mid-stream disconnects at any probability
 - **Per-Request Strict Mode** — `X-AIMock-Strict` header overrides the server-level `--strict` flag per request (`true`/`1` = strict, `false`/`0` = lenient)
@@ -62,6 +62,8 @@ Run them all on one port with `npx @copilotkit/aimock --config aimock.json`, or
 - **[Docker + Helm](https://aimock.copilotkit.dev/docker)** — Container image and Helm chart for CI/CD
 - **[Vitest & Jest Plugins](https://aimock.copilotkit.dev/test-plugins)** — Zero-config `useAimock()` with auto lifecycle and env patching
 - **[Response Overrides](https://aimock.copilotkit.dev/fixtures)** — Control `id`, `model`, `usage`, `finishReason` in fixture responses
+- **[Streaming Usage Chunks](https://aimock.copilotkit.dev/streaming-physics)** — `stream_options.include_usage` support emits a final chunk with token counts, matching OpenAI's streaming usage protocol
+- **[Rate Limiting Headers](https://aimock.copilotkit.dev/chaos-testing)** — `x-ratelimit-*` headers on every response and `Retry-After` on 429 errors for testing retry/backoff logic
 - **Zero dependencies** — Everything from Node.js builtins
 
 ## GitHub Action

diff --git a/charts/aimock/Chart.yaml b/charts/aimock/Chart.yaml
@@ -3,4 +3,4 @@ name: aimock
 description: Mock infrastructure for AI application testing (OpenAI, Anthropic, Gemini, MCP, A2A, vector)
 type: application
 version: 0.1.0
-appVersion: "1.23.0"
+appVersion: "1.25.0"
diff --git a/docs/index.html b/docs/index.html
@@ -1496,8 +1496,8 @@ <h2 class="fade-in">Everything you need</h2>
             <div class="feature-icon">&#128225;</div>
             <h3>Every Major LLM Provider</h3>
             <p>
-              OpenAI, Claude, Gemini, Gemini Interactions, Bedrock, Azure, Vertex AI, Ollama, Cohere
-              &mdash; full streaming and embeddings support for every provider.
+              OpenAI, Claude, Gemini, Gemini Interactions, Bedrock, Azure, Vertex AI, Ollama,
+              Cohere, ElevenLabs &mdash; full streaming and embeddings support for every provider.
             </p>
           </div>
 
@@ -1547,8 +1547,9 @@ <h3>Chaos Testing</h3>
             <div class="feature-icon">&#127912;</div>
             <h3>Multimedia APIs</h3>
             <p>
-              Image generation, text-to-speech, audio transcription, non-speech audio generation,
-              and video generation &mdash; mock every multimedia endpoint with fixtures.
+              Image generation and editing, text-to-speech (OpenAI + ElevenLabs), audio
+              transcription and translation, non-speech audio generation, and video generation
+              &mdash; mock every multimedia endpoint with fixtures.
             </p>
           </div>
 
@@ -1680,7 +1681,7 @@ <h2 class="fade-in">How aimock compares</h2>
               </tr>
               <tr>
                 <td>Multi-provider support</td>
-                <td class="col-aimock"><span class="yes">13 providers &#10003;</span></td>
+                <td class="col-aimock"><span class="yes">14 providers &#10003;</span></td>
                 <td><span class="manual">manual</span></td>
                 <td>12 providers</td>
                 <td>OpenAI only</td>
@@ -1714,6 +1715,15 @@ <h2 class="fade-in">How aimock compares</h2>
                 <td><span class="no">&#10007;</span></td>
                 <td><span class="no">&#10007;</span></td>
               </tr>
+              <tr>
+                <td>Image editing</td>
+                <td class="col-aimock"><span class="yes">Built-in &#10003;</span></td>
+                <td><span class="no">&#10007;</span></td>
+                <td><span class="no">&#10007;</span></td>
+                <td><span class="no">&#10007;</span></td>
+                <td><span class="no">&#10007;</span></td>
+                <td><span class="no">&#10007;</span></td>
+              </tr>
               <tr>
                 <td>Text-to-Speech</td>
                 <td class="col-aimock"><span class="yes">Built-in &#10003;</span></td>
@@ -1732,6 +1742,15 @@ <h2 class="fade-in">How aimock compares</h2>
                 <td><span class="no">&#10007;</span></td>
                 <td><span class="no">&#10007;</span></td>
               </tr>
+              <tr>
+                <td>Audio translation</td>
+                <td class="col-aimock"><span class="yes">Built-in &#10003;</span></td>
+                <td><span class="no">&#10007;</span></td>
+                <td><span class="no">&#10007;</span></td>
+                <td><span class="no">&#10007;</span></td>
+                <td><span class="no">&#10007;</span></td>
+                <td><span class="no">&#10007;</span></td>
+              </tr>
               <tr>
                 <td>Non-speech audio</td>
                 <td class="col-aimock"><span class="yes">Built-in &#10003;</span></td>
@@ -1921,6 +1940,24 @@ <h2 class="fade-in">How aimock compares</h2>
                 <td><span class="no">&#10007;</span></td>
                 <td><span class="no">&#10007;</span></td>
               </tr>
+              <tr>
+                <td>Streaming usage chunks</td>
+                <td class="col-aimock"><span class="yes">Built-in &#10003;</span></td>
+                <td><span class="no">&#10007;</span></td>
+                <td><span class="no">&#10007;</span></td>
+                <td><span class="no">&#10007;</span></td>
+                <td><span class="no">&#10007;</span></td>
+                <td><span class="no">&#10007;</span></td>
+              </tr>
+              <tr>
+                <td>Rate limiting headers</td>
+                <td class="col-aimock"><span class="yes">Built-in &#10003;</span></td>
+                <td><span class="no">&#10007;</span></td>
+                <td><span class="no">&#10007;</span></td>
+                <td><span class="no">&#10007;</span></td>
+                <td><span class="no">&#10007;</span></td>
+                <td><span class="no">&#10007;</span></td>
+              </tr>
               <tr>
                 <td>Dependencies</td>
                 <td class="col-aimock"><span class="yes">Zero &#10003;</span></td>

diff --git a/package.json b/package.json
@@ -1,6 +1,6 @@
 {
   "name": "@copilotkit/aimock",
-  "version": "1.24.1",
+  "version": "1.25.0",
   "description": "Mock infrastructure for AI application testing — LLM APIs, image generation, text-to-speech, transcription, audio generation, video generation, MCP tools, A2A agents, AG-UI event streams, vector databases, search, rerank, and moderation. One package, one port, zero dependencies.",
   "license": "MIT",
   "keywords": [