Skip to content

[BOT ISSUE] OpenAI: add missing gpt-realtime-translate and gpt-realtime-whisper models #564

@github-actions

Description

@github-actions

Gap

Two new OpenAI realtime models are missing from packages/proxy/schema/model_list.json:

  1. gpt-realtime-translate — a live translation model that translates speech from 70+ input languages into 13 output languages
  2. gpt-realtime-whisper — a streaming speech-to-text model for low-latency realtime transcription

Both were announced on May 7, 2026 alongside gpt-realtime-2. The catalog already includes other realtime models (gpt-realtime-mini, gpt-realtime-1.5) and gpt-realtime-2 is tracked separately in #543.

Verified fields

gpt-realtime-translate

Field Value Source
Model ID gpt-realtime-translate Model page
Format openai Catalog convention for OpenAI models
Flavor chat Catalog convention
Max input tokens 16,000 Model page
Max output tokens 2,000 Model page
Available providers ["openai"] Model page

gpt-realtime-whisper

Field Value Source
Model ID gpt-realtime-whisper Model page
Format openai Catalog convention for OpenAI models
Flavor chat Catalog convention
Max input tokens 16,000 Model page
Max output tokens 2,000 Model page
Available providers ["openai"] Model page

Verification note

  • Model existence: Both models confirmed on the OpenAI models listing, their dedicated model pages (translate, whisper), and the launch announcement (three independent official signals).
  • Pricing: Both models use per-minute audio pricing ($0.034/min for translate, $0.017/min for whisper) which does not map cleanly to the per-token input_cost_per_mil_tokens/output_cost_per_mil_tokens schema fields. Per-token pricing is not published — pricing fields are omitted rather than guessed.
  • Token limits: 16,000 context / 2,000 max output confirmed on each model's detail page.
  • displayName: Not published by OpenAI; omitted rather than guessed.
  • parent: Not applicable — both are standalone models with no snapshots.
  • multimodal: Not set. These are audio-focused models; the multimodal field is typically used for vision (image input) support in the catalog.

Local files inspected

  • packages/proxy/schema/model_list.json — grep confirms neither gpt-realtime-translate nor gpt-realtime-whisper exists

Source URLs

{
  "kind": "missing_model",
  "provider": "openai",
  "models": ["gpt-realtime-translate", "gpt-realtime-whisper"],
  "status": "active",
  "model_specs": {
    "gpt-realtime-translate": {
      "format": "openai",
      "flavor": "chat",
      "max_input_tokens": 16000,
      "max_output_tokens": 2000,
      "available_providers": ["openai"]
    },
    "gpt-realtime-whisper": {
      "format": "openai",
      "flavor": "chat",
      "max_input_tokens": 16000,
      "max_output_tokens": 2000,
      "available_providers": ["openai"]
    }
  },
  "source_urls": [
    "https://developers.openai.com/api/docs/models/gpt-realtime-translate",
    "https://developers.openai.com/api/docs/models/gpt-realtime-whisper",
    "https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/"
  ]
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions