Fix Realtime API: correct model name, remove beta header, fix session flow#5
Draft
Fix Realtime API: correct model name, remove beta header, fix session flow#5
Conversation
…rfacing Co-authored-by: hack-r <7491078+hack-r@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Redo implementation of Realtime API for low-latency updates
Fix Realtime API: correct model name, remove beta header, fix session flow
Mar 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previous implementation used a non-existent model (
gpt-realtime-1.5) and included the beta-onlyOpenAI-Beta: realtime=v1header, causing OpenAI to accept the WebSocket upgrade then immediately close with code 1000.Server (
server/index.ts)gpt-realtime-1.5→gpt-4o-realtime-previewOpenAI-Beta: realtime=v1header (GA API rejects it)errorevent before closingClient (
src/services/providers/openai.ts)configureSession()now fires onsession.created(notws.onopen), which is the correct Realtime API handshake sequence. This also eliminates the doubleonConnected()call that occurred when bothsession.createdandsession.updatedresolved the same handler.ws.oncloseduringconnectingstate now rejects the promise and surfaces an error to the UI instead of silently transitioning toidlesendImage:imagefield renamed toimage_url— correct field name forinput_imagecontent in the GA APIConfig
.env.exampleupdated to reflectgpt-4o-realtime-previewOriginal prompt
This section details on the original issue you should resolve
<issue_title>Redo: #3</issue_title>
<issue_description>#3
That was a bad PR. The client disconnects immediately. Did you follow the docs I shared with you? Ensure that you are using realtime-1.5 and that you use the correct syntax.
Make sure you surface or log any OpenAI errors. Understand that although this is optimized for speech-to-speech we are more interested in its high speed and updates and will mostly use it as image-totext (we show it screenshots and it produces notes).
Realtime API
Build low-latency, multimodal LLM applications with the Realtime API.
The OpenAI Realtime API enables low-latency communication with models that natively support speech-to-speech interactions as well as multimodal inputs (audio, images, and text) and outputs (audio and text). These APIs can also be used for realtime audio transcription.
Voice agents
One of the most common use cases for the Realtime API is building voice agents for speech-to-speech model interactions in the browser. Our recommended starting point for these types of applications is the Agents SDK for TypeScript, which uses a WebRTC connection to the Realtime model in the browser, and WebSocket when used on the server.
import { RealtimeAgent, RealtimeSession } from "@openai/agents/realtime";
const agent = new RealtimeAgent({
name: "Assistant",
instructions: "You are a helpful assistant.",
});
const session = new RealtimeSession(agent);
// Automatically connects your microphone and audio output
await session.connect({
apiKey: "",
});
Voice Agent Quickstart
Follow the voice agent quickstart to build Realtime agents in the browser.
To use the Realtime API directly outside the context of voice agents, check out the other connection options below.
Connection methods
While building voice agents with the Agents SDK is the fastest path to one specific type of application, the Realtime API provides an entire suite of flexible tools for a variety of use cases.
There are three primary supported interfaces for the Realtime API:
WebRTC connection
Ideal for browser and client-side interactions with a Realtime model.
WebSocket connection
Ideal for middle tier server-side applications with consistent low-latency network connections.
SIP connection
Ideal for VoIP telephony connections.
Depending on how you’d like to connect to a Realtime model, check out one of the connection guides above to get started. You’ll learn how to initialize a Realtime session, and how to interact with a Realtime model using client and server events.
API Usage
Once connected to a realtime model using one of the methods above, learn how to interact with the model in these usage guides.
Beta to GA migration
There are a few key differences between the interfaces in the Realtime beta API and the recently released GA API. Expand the topics below for more information about migrating from the beta interface to GA.
Beta header
For REST API requests, WebSocket connections, and other interfaces with the Realtime API, beta users had to include the following header with each request:
OpenAI-Beta: realtime=v1
This header should be removed for requests to the GA interface. To retain the behavior of the beta API, you should continue to include this header.
Generating ephemeral API keys
In the beta interface, there were multiple endpoints for generating ephemeral keys for either Realtime sessions or transcription sessions. In the GA interface, there is only one REST API endpoint used to generate keys - [POST /v1/realtime/client_secrets](http...
🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.