From 3730ea7be3730a3390b155001d664e217a80dba4 Mon Sep 17 00:00:00 2001 From: Federico Kamelhar Date: Fri, 22 May 2026 01:54:36 -0400 Subject: [PATCH 1/2] docs(oci): expand provider page with proxy, tools, vision, reasoning, env vars Bring the OCI provider docs up to parity with the Bedrock page: - Environment variables for all credentials (OCI_REGION, OCI_USER, OCI_FINGERPRINT, OCI_TENANCY, OCI_COMPARTMENT_ID, OCI_KEY, OCI_KEY_FILE) - LiteLLM Proxy Usage section: config.yaml example with both Grok and Cohere entries, start command, Curl + OpenAI client smoke tests - Function Calling / Tool Calling: OpenAI-compatible tools example for both SDK and proxy modes, with a note that Cohere and Generic vendors are adapted internally - Vision / Multimodal: image_url example plus the full list of vision-capable models - Reasoning / Thinking: reasoning_effort (low/medium/high/disable) and reasoning_tokens surfaced on usage; documents that the param is silently ignored for Cohere models - Optional Parameters table extended with an Environment Variable column and a reasoning_effort row Reconciled the Supported Models list against OCI's on-demand retirement page: - Removed retired meta.llama-3.1-405b-instruct and meta.llama-3.1-70b-instruct - Added xai.grok-4.3, openai.gpt-oss-120b/20b, and cohere.embed-multilingual-image-v3.0 - Flagged retirement dates on Llama 3.2-90b-vision, all Grok 3/4/4.x, Cohere R+/R 08-2024, and all embed v3.0 models - Switched the Vision example from Llama 3.2-90b (retires 2026-09-30) to Llama 4 Maverick - Added an info callout linking to OCI's retirement page so readers can verify dates themselves --- docs/providers/oci.md | 374 +++++++++++++++++++++++++++++++++++++----- 1 file changed, 335 insertions(+), 39 deletions(-) diff --git a/docs/providers/oci.md b/docs/providers/oci.md index 182bb4407..1b5d8504d 100644 --- a/docs/providers/oci.md +++ b/docs/providers/oci.md @@ -8,54 +8,67 @@ Check the [OCI Models List](https://docs.oracle.com/en-us/iaas/Content/generativ ## Supported Models +The list below tracks OCI's on-demand model catalog. For authoritative retirement dates and recommended replacements, see [OCI's on-demand model retirement page](https://docs.oracle.com/en-us/iaas/Content/generative-ai/deprecating-on-demand.htm). + +:::info +OCI rotates models in and out of `ON_DEMAND` serving regularly. Models flagged below with a retirement date will continue to work in LiteLLM until OCI stops serving them — at which point requests will return a 404 from OCI. Plan migrations using the replacements OCI recommends on the retirement page. +::: + ### Chat / Text Generation #### Meta Llama Models -- `meta.llama-4-maverick-17b-128e-instruct-fp8` -- `meta.llama-4-scout-17b-16e-instruct` +- `meta.llama-4-maverick-17b-128e-instruct-fp8` (multimodal) +- `meta.llama-4-scout-17b-16e-instruct` (multimodal) - `meta.llama-3.3-70b-instruct` - `meta.llama-3.3-70b-instruct-fp8-dynamic` -- `meta.llama-3.2-90b-vision-instruct` +- `meta.llama-3.2-90b-vision-instruct` *(retires 2026-09-30 — replace with Llama 4)* - `meta.llama-3.2-11b-vision-instruct` -- `meta.llama-3.1-405b-instruct` -- `meta.llama-3.1-70b-instruct` #### xAI Grok Models +- `xai.grok-4.3` *(latest)* - `xai.grok-4.20` - `xai.grok-4.20-multi-agent` -- `xai.grok-4` -- `xai.grok-4-fast` -- `xai.grok-4.1-fast` -- `xai.grok-3` -- `xai.grok-3-fast` -- `xai.grok-3-mini` -- `xai.grok-3-mini-fast` -- `xai.grok-code-fast-1` +- `xai.grok-4` *(retires 2026-08-15 — replace with Grok 4.3)* +- `xai.grok-4-fast` *(retires 2026-08-15 — replace with Grok 4.3)* +- `xai.grok-4.1-fast` *(retires 2026-08-15 — replace with Grok 4.3)* +- `xai.grok-3` *(retires 2026-08-15 — replace with Grok 4.3)* +- `xai.grok-3-fast` *(retires 2026-08-15 — replace with Grok 4.3)* +- `xai.grok-3-mini` *(retires 2026-08-15 — replace with Grok 4.3)* +- `xai.grok-3-mini-fast` *(retires 2026-08-15 — replace with Grok 4.3)* +- `xai.grok-code-fast-1` *(retires 2026-08-15 — replace with Grok 4.3)* #### Cohere Models - `cohere.command-latest` - `cohere.command-a-03-2025` - `cohere.command-a-reasoning-08-2025` -- `cohere.command-a-vision-07-2025` +- `cohere.command-a-vision-07-2025` (multimodal) - `cohere.command-a-translate-08-2025` - `cohere.command-plus-latest` -- `cohere.command-r-08-2024` -- `cohere.command-r-plus-08-2024` +- `cohere.command-r-plus-08-2024` *(retires 2026-09-30 — replace with `cohere.command-a-03-2025`)* +- `cohere.command-r-08-2024` *(retires 2026-09-30 — replace with `cohere.command-a-03-2025`)* #### Google Gemini Models (via OCI) -- `google.gemini-2.5-pro` -- `google.gemini-2.5-flash` -- `google.gemini-2.5-flash-lite` +- `google.gemini-2.5-pro` (multimodal) +- `google.gemini-2.5-flash` (multimodal) +- `google.gemini-2.5-flash-lite` (multimodal) + +#### OpenAI Open-Source Models (via OCI) +- `openai.gpt-oss-120b` +- `openai.gpt-oss-20b` ### Embedding Models -- `cohere.embed-english-v3.0` (1024 dimensions) -- `cohere.embed-english-light-v3.0` (384 dimensions) -- `cohere.embed-multilingual-v3.0` (1024 dimensions) -- `cohere.embed-multilingual-light-v3.0` (384 dimensions) -- `cohere.embed-english-image-v3.0` (1024 dimensions, multimodal) -- `cohere.embed-english-light-image-v3.0` (384 dimensions, multimodal) -- `cohere.embed-multilingual-light-image-v3.0` (384 dimensions, multimodal) -- `cohere.embed-v4.0` (1536 dimensions, multimodal) + +All `v3.0` embedding models retire **2026-09-30** — Oracle recommends migrating to `cohere.embed-v4.0`. + +- `cohere.embed-v4.0` (1536 dimensions, multimodal) — recommended +- `cohere.embed-english-v3.0` (1024 dimensions) *(retires 2026-09-30)* +- `cohere.embed-english-light-v3.0` (384 dimensions) *(retires 2026-09-30)* +- `cohere.embed-multilingual-v3.0` (1024 dimensions) *(retires 2026-09-30)* +- `cohere.embed-multilingual-light-v3.0` (384 dimensions) *(retires 2026-09-30)* +- `cohere.embed-english-image-v3.0` (1024 dimensions, multimodal) *(retires 2026-09-30)* +- `cohere.embed-english-light-image-v3.0` (384 dimensions, multimodal) *(retires 2026-09-30)* +- `cohere.embed-multilingual-image-v3.0` (1024 dimensions, multimodal) *(retires 2026-09-30)* +- `cohere.embed-multilingual-light-image-v3.0` (384 dimensions, multimodal) *(retires 2026-09-30)* ## Authentication @@ -73,6 +86,21 @@ Provide individual OCI credentials directly to LiteLLM. Follow the [official Ora This is the default method for LiteLLM AI Gateway (LLM Proxy) access to OCI GenAI models. +**Environment Variables** + +Instead of passing credentials in code, you can set the following environment variables — LiteLLM will read them automatically: + +```bash +export OCI_REGION="us-chicago-1" +export OCI_USER="ocid1.user.oc1.." +export OCI_FINGERPRINT="xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx" +export OCI_TENANCY="ocid1.tenancy.oc1.." +export OCI_COMPARTMENT_ID="ocid1.compartment.oc1.." +# Provide either the private key content OR the path to the key file: +export OCI_KEY_FILE="/path/to/oci_api_key.pem" +# export OCI_KEY="-----BEGIN PRIVATE KEY-----\n..." +``` + ### Method 2: OCI SDK Signer Use an OCI SDK `Signer` object for authentication. This method: - Leverages the official [OCI SDK for signing](https://docs.oracle.com/en-us/iaas/tools/python/latest/api/signing.html) @@ -220,6 +248,92 @@ print(response) +## LiteLLM Proxy Usage + +Here's how to call OCI GenAI through the LiteLLM Proxy Server. + +### 1. Setup config.yaml + +```yaml +model_list: + - model_name: oci-grok-4 + litellm_params: + model: oci/xai.grok-4 + oci_region: os.environ/OCI_REGION + oci_user: os.environ/OCI_USER + oci_fingerprint: os.environ/OCI_FINGERPRINT + oci_tenancy: os.environ/OCI_TENANCY + oci_key_file: os.environ/OCI_KEY_FILE + oci_compartment_id: os.environ/OCI_COMPARTMENT_ID + + - model_name: oci-cohere-command + litellm_params: + model: oci/cohere.command-latest + oci_region: os.environ/OCI_REGION + oci_user: os.environ/OCI_USER + oci_fingerprint: os.environ/OCI_FINGERPRINT + oci_tenancy: os.environ/OCI_TENANCY + oci_key_file: os.environ/OCI_KEY_FILE + oci_compartment_id: os.environ/OCI_COMPARTMENT_ID +``` + +All possible auth params: + +``` +oci_region: Optional[str], +oci_user: Optional[str], +oci_fingerprint: Optional[str], +oci_tenancy: Optional[str], +oci_key: Optional[str], # private key content as string +oci_key_file: Optional[str], # path to .pem file +oci_compartment_id: Optional[str], +oci_serving_mode: Optional[str], # "ON_DEMAND" (default) or "DEDICATED" +oci_endpoint_id: Optional[str], # only used with DEDICATED +``` + +### 2. Start the proxy + +```bash +litellm --config /path/to/config.yaml +``` + +### 3. Test it + + + + +```shell +curl --location 'http://0.0.0.0:4000/chat/completions' \ +--header 'Content-Type: application/json' \ +--data '{ + "model": "oci-grok-4", + "messages": [ + {"role": "user", "content": "what llm are you"} + ] +}' +``` + + + + +```python +import openai + +client = openai.OpenAI( + api_key="anything", + base_url="http://0.0.0.0:4000" +) + +response = client.chat.completions.create( + model="oci-grok-4", + messages=[{"role": "user", "content": "write a short poem"}], +) +print(response) +``` + + + + ## Usage - Streaming Just set `stream=True` when calling completion. @@ -411,20 +525,202 @@ response = completion( ) ``` +## Usage - Function Calling / Tool Calling + +OCI GenAI supports OpenAI-compatible function calling. LiteLLM normalizes the request and response shape so the same code that targets OpenAI works with OCI Cohere and Generic (xAI Grok, Meta Llama, Google Gemini) models. + + + + +```python +from litellm import completion + +tools = [ + { + "type": "function", + "function": { + "name": "get_current_weather", + "description": "Get the current weather in a given location", + "parameters": { + "type": "object", + "properties": { + "location": { + "type": "string", + "description": "The city and state, e.g. San Francisco, CA", + }, + "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}, + }, + "required": ["location"], + }, + }, + } +] + +response = completion( + model="oci/xai.grok-4", + messages=[{"role": "user", "content": "What's the weather in Boston today?"}], + tools=tools, + tool_choice="auto", + oci_region="us-chicago-1", + oci_user="", + oci_fingerprint="", + oci_tenancy="", + oci_key_file="", + oci_compartment_id="", +) + +# Inspect the tool call +print(response.choices[0].message.tool_calls) +``` + + + + +```python +import openai + +client = openai.OpenAI(api_key="anything", base_url="http://0.0.0.0:4000") + +response = client.chat.completions.create( + model="oci-grok-4", + messages=[{"role": "user", "content": "What's the weather in Boston today?"}], + tools=[ + { + "type": "function", + "function": { + "name": "get_current_weather", + "description": "Get the current weather in a given location", + "parameters": { + "type": "object", + "properties": { + "location": {"type": "string"}, + "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}, + }, + "required": ["location"], + }, + }, + } + ], + tool_choice="auto", +) +print(response.choices[0].message.tool_calls) +``` + + + + +Tool calling works with both Cohere (`cohere.command-*`) and Generic (`xai.grok-*`, `meta.llama-*`, `google.gemini-*`) model families — LiteLLM adapts the OpenAI tool schema to each vendor's native format internally. + +## Usage - Vision / Multimodal + +OCI GenAI exposes vision-capable models that accept images alongside text. Pass images using the standard OpenAI `image_url` content block. + +```python +from litellm import completion + +response = completion( + model="oci/meta.llama-4-maverick-17b-128e-instruct-fp8", + messages=[ + { + "role": "user", + "content": [ + {"type": "text", "text": "What is in this image?"}, + { + "type": "image_url", + "image_url": { + "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" + }, + }, + ], + } + ], + oci_region="us-chicago-1", + oci_user="", + oci_fingerprint="", + oci_tenancy="", + oci_key_file="", + oci_compartment_id="", +) +print(response.choices[0].message.content) +``` + +Vision-capable models on OCI include: + +- `meta.llama-4-maverick-17b-128e-instruct-fp8` +- `meta.llama-4-scout-17b-16e-instruct` +- `meta.llama-3.2-11b-vision-instruct` +- `meta.llama-3.2-90b-vision-instruct` *(retires 2026-09-30)* +- `cohere.command-a-vision-07-2025` +- `google.gemini-2.5-pro`, `google.gemini-2.5-flash`, `google.gemini-2.5-flash-lite` + +Both URL and base64-encoded data URIs are supported. + +## Usage - Reasoning / Thinking + +OCI Generic-vendor models (xAI Grok reasoning variants, Google Gemini, etc.) support a reasoning step. LiteLLM exposes this via the OpenAI-compatible `reasoning_effort` parameter — accepted values are `"low"`, `"medium"`, `"high"`, and `"disable"` (mapped to OCI's `NONE`). + +Returned reasoning tokens are surfaced on `usage.completion_tokens_details.reasoning_tokens`, matching the OpenAI shape. + + + + +```python +from litellm import completion + +response = completion( + model="oci/xai.grok-3-mini", + messages=[{"role": "user", "content": "If 3x + 7 = 22, what is x? Show your reasoning."}], + reasoning_effort="high", # "low" | "medium" | "high" | "disable" + oci_region="us-chicago-1", + oci_user="", + oci_fingerprint="", + oci_tenancy="", + oci_key_file="", + oci_compartment_id="", +) + +print(response.choices[0].message.content) +print("Reasoning tokens:", response.usage.completion_tokens_details.reasoning_tokens) +``` + + + + +```python +import openai + +client = openai.OpenAI(api_key="anything", base_url="http://0.0.0.0:4000") + +response = client.chat.completions.create( + model="oci-grok-mini", + messages=[{"role": "user", "content": "If 3x + 7 = 22, what is x?"}], + reasoning_effort="high", +) +print(response.choices[0].message.content) +``` + + + + +:::note +`reasoning_effort` is only honored on Generic-vendor reasoning models (e.g., `xai.grok-3-mini`, `xai.grok-4`, `google.gemini-2.5-pro`). It is silently ignored for OCI Cohere models, which are not reasoning models. +::: + ## Optional Parameters -| Parameter | Type | Default | Description | -|-----------|------|---------|-------------| -| `oci_region` | string | `us-ashburn-1` | OCI region where the GenAI service is deployed | -| `oci_serving_mode` | string | `ON_DEMAND` | Service mode: `ON_DEMAND` for managed models or `DEDICATED` for dedicated endpoints | -| `oci_endpoint_id` | string | Same as `model` | (For DEDICATED mode) The OCID of your dedicated endpoint | -| `oci_compartment_id` | string | **Required** | The OCID of the OCI compartment containing your resources | -| `oci_user` | string | - | (Manual auth) The OCID of the OCI user | -| `oci_fingerprint` | string | - | (Manual auth) The fingerprint of the API signing key | -| `oci_tenancy` | string | - | (Manual auth) The OCID of your OCI tenancy | -| `oci_key` | string | - | (Manual auth) The private key content as a string | -| `oci_key_file` | string | - | (Manual auth) Path to the private key file | -| `oci_signer` | object | - | (SDK auth) OCI SDK Signer object for authentication | +| Parameter | Type | Default | Environment Variable | Description | +|-----------|------|---------|----------------------|-------------| +| `oci_region` | string | `us-ashburn-1` | `OCI_REGION` | OCI region where the GenAI service is deployed | +| `oci_serving_mode` | string | `ON_DEMAND` | – | Service mode: `ON_DEMAND` for managed models or `DEDICATED` for dedicated endpoints | +| `oci_endpoint_id` | string | Same as `model` | – | (For DEDICATED mode) The OCID of your dedicated endpoint | +| `oci_compartment_id` | string | **Required** | `OCI_COMPARTMENT_ID` | The OCID of the OCI compartment containing your resources | +| `oci_user` | string | – | `OCI_USER` | (Manual auth) The OCID of the OCI user | +| `oci_fingerprint` | string | – | `OCI_FINGERPRINT` | (Manual auth) The fingerprint of the API signing key | +| `oci_tenancy` | string | – | `OCI_TENANCY` | (Manual auth) The OCID of your OCI tenancy | +| `oci_key` | string | – | `OCI_KEY` | (Manual auth) The private key content as a string | +| `oci_key_file` | string | – | `OCI_KEY_FILE` | (Manual auth) Path to the private key file | +| `oci_signer` | object | – | – | (SDK auth) OCI SDK Signer object for authentication | +| `reasoning_effort` | string | – | – | Reasoning level for Generic-vendor reasoning models: `low`, `medium`, `high`, `disable` | ## Embeddings From 127c7b543c18052aa207c5921936624c2b29e2d5 Mon Sep 17 00:00:00 2001 From: Federico Kamelhar Date: Fri, 22 May 2026 03:20:11 -0400 Subject: [PATCH 2/2] docs(oci): align deprecation handling with bedrock/openai style MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Other LiteLLM provider pages (bedrock, openai, anthropic) don't track retirement dates inline — they list active models and let the provider's own lifecycle page own the dates. Drop the per-model *(retires ...)* annotations to match. The single info-callout link to OCI's retirement page stays so readers can verify the schedule. --- docs/providers/oci.md | 55 +++++++++++++++++++------------------------ 1 file changed, 24 insertions(+), 31 deletions(-) diff --git a/docs/providers/oci.md b/docs/providers/oci.md index 1b5d8504d..5e0277a08 100644 --- a/docs/providers/oci.md +++ b/docs/providers/oci.md @@ -8,11 +8,7 @@ Check the [OCI Models List](https://docs.oracle.com/en-us/iaas/Content/generativ ## Supported Models -The list below tracks OCI's on-demand model catalog. For authoritative retirement dates and recommended replacements, see [OCI's on-demand model retirement page](https://docs.oracle.com/en-us/iaas/Content/generative-ai/deprecating-on-demand.htm). - -:::info -OCI rotates models in and out of `ON_DEMAND` serving regularly. Models flagged below with a retirement date will continue to work in LiteLLM until OCI stops serving them — at which point requests will return a 404 from OCI. Plan migrations using the replacements OCI recommends on the retirement page. -::: +For model lifecycle, retirement dates, and recommended replacements, see [OCI's on-demand model retirement page](https://docs.oracle.com/en-us/iaas/Content/generative-ai/deprecating-on-demand.htm) — Oracle is the authoritative source. ### Chat / Text Generation @@ -21,21 +17,21 @@ OCI rotates models in and out of `ON_DEMAND` serving regularly. Models flagged b - `meta.llama-4-scout-17b-16e-instruct` (multimodal) - `meta.llama-3.3-70b-instruct` - `meta.llama-3.3-70b-instruct-fp8-dynamic` -- `meta.llama-3.2-90b-vision-instruct` *(retires 2026-09-30 — replace with Llama 4)* -- `meta.llama-3.2-11b-vision-instruct` +- `meta.llama-3.2-90b-vision-instruct` (multimodal) +- `meta.llama-3.2-11b-vision-instruct` (multimodal) #### xAI Grok Models -- `xai.grok-4.3` *(latest)* +- `xai.grok-4.3` - `xai.grok-4.20` - `xai.grok-4.20-multi-agent` -- `xai.grok-4` *(retires 2026-08-15 — replace with Grok 4.3)* -- `xai.grok-4-fast` *(retires 2026-08-15 — replace with Grok 4.3)* -- `xai.grok-4.1-fast` *(retires 2026-08-15 — replace with Grok 4.3)* -- `xai.grok-3` *(retires 2026-08-15 — replace with Grok 4.3)* -- `xai.grok-3-fast` *(retires 2026-08-15 — replace with Grok 4.3)* -- `xai.grok-3-mini` *(retires 2026-08-15 — replace with Grok 4.3)* -- `xai.grok-3-mini-fast` *(retires 2026-08-15 — replace with Grok 4.3)* -- `xai.grok-code-fast-1` *(retires 2026-08-15 — replace with Grok 4.3)* +- `xai.grok-4` +- `xai.grok-4-fast` +- `xai.grok-4.1-fast` +- `xai.grok-3` +- `xai.grok-3-fast` +- `xai.grok-3-mini` +- `xai.grok-3-mini-fast` +- `xai.grok-code-fast-1` #### Cohere Models - `cohere.command-latest` @@ -44,8 +40,8 @@ OCI rotates models in and out of `ON_DEMAND` serving regularly. Models flagged b - `cohere.command-a-vision-07-2025` (multimodal) - `cohere.command-a-translate-08-2025` - `cohere.command-plus-latest` -- `cohere.command-r-plus-08-2024` *(retires 2026-09-30 — replace with `cohere.command-a-03-2025`)* -- `cohere.command-r-08-2024` *(retires 2026-09-30 — replace with `cohere.command-a-03-2025`)* +- `cohere.command-r-plus-08-2024` +- `cohere.command-r-08-2024` #### Google Gemini Models (via OCI) - `google.gemini-2.5-pro` (multimodal) @@ -57,18 +53,15 @@ OCI rotates models in and out of `ON_DEMAND` serving regularly. Models flagged b - `openai.gpt-oss-20b` ### Embedding Models - -All `v3.0` embedding models retire **2026-09-30** — Oracle recommends migrating to `cohere.embed-v4.0`. - -- `cohere.embed-v4.0` (1536 dimensions, multimodal) — recommended -- `cohere.embed-english-v3.0` (1024 dimensions) *(retires 2026-09-30)* -- `cohere.embed-english-light-v3.0` (384 dimensions) *(retires 2026-09-30)* -- `cohere.embed-multilingual-v3.0` (1024 dimensions) *(retires 2026-09-30)* -- `cohere.embed-multilingual-light-v3.0` (384 dimensions) *(retires 2026-09-30)* -- `cohere.embed-english-image-v3.0` (1024 dimensions, multimodal) *(retires 2026-09-30)* -- `cohere.embed-english-light-image-v3.0` (384 dimensions, multimodal) *(retires 2026-09-30)* -- `cohere.embed-multilingual-image-v3.0` (1024 dimensions, multimodal) *(retires 2026-09-30)* -- `cohere.embed-multilingual-light-image-v3.0` (384 dimensions, multimodal) *(retires 2026-09-30)* +- `cohere.embed-v4.0` (1536 dimensions, multimodal) +- `cohere.embed-english-v3.0` (1024 dimensions) +- `cohere.embed-english-light-v3.0` (384 dimensions) +- `cohere.embed-multilingual-v3.0` (1024 dimensions) +- `cohere.embed-multilingual-light-v3.0` (384 dimensions) +- `cohere.embed-english-image-v3.0` (1024 dimensions, multimodal) +- `cohere.embed-english-light-image-v3.0` (384 dimensions, multimodal) +- `cohere.embed-multilingual-image-v3.0` (1024 dimensions, multimodal) +- `cohere.embed-multilingual-light-image-v3.0` (384 dimensions, multimodal) ## Authentication @@ -649,7 +642,7 @@ Vision-capable models on OCI include: - `meta.llama-4-maverick-17b-128e-instruct-fp8` - `meta.llama-4-scout-17b-16e-instruct` - `meta.llama-3.2-11b-vision-instruct` -- `meta.llama-3.2-90b-vision-instruct` *(retires 2026-09-30)* +- `meta.llama-3.2-90b-vision-instruct` - `cohere.command-a-vision-07-2025` - `google.gemini-2.5-pro`, `google.gemini-2.5-flash`, `google.gemini-2.5-flash-lite`