From 3730ea7be3730a3390b155001d664e217a80dba4 Mon Sep 17 00:00:00 2001
From: Federico Kamelhar <federico.kamelhar@oracle.com>
Date: Fri, 22 May 2026 01:54:36 -0400
Subject: [PATCH 1/2] docs(oci): expand provider page with proxy, tools,
 vision, reasoning, env vars

Bring the OCI provider docs up to parity with the Bedrock page:

- Environment variables for all credentials (OCI_REGION, OCI_USER,
  OCI_FINGERPRINT, OCI_TENANCY, OCI_COMPARTMENT_ID, OCI_KEY,
  OCI_KEY_FILE)
- LiteLLM Proxy Usage section: config.yaml example with both Grok and
  Cohere entries, start command, Curl + OpenAI client smoke tests
- Function Calling / Tool Calling: OpenAI-compatible tools example for
  both SDK and proxy modes, with a note that Cohere and Generic vendors
  are adapted internally
- Vision / Multimodal: image_url example plus the full list of
  vision-capable models
- Reasoning / Thinking: reasoning_effort (low/medium/high/disable) and
  reasoning_tokens surfaced on usage; documents that the param is
  silently ignored for Cohere models
- Optional Parameters table extended with an Environment Variable
  column and a reasoning_effort row

Reconciled the Supported Models list against OCI's on-demand
retirement page:

- Removed retired meta.llama-3.1-405b-instruct and
  meta.llama-3.1-70b-instruct
- Added xai.grok-4.3, openai.gpt-oss-120b/20b, and
  cohere.embed-multilingual-image-v3.0
- Flagged retirement dates on Llama 3.2-90b-vision, all Grok 3/4/4.x,
  Cohere R+/R 08-2024, and all embed v3.0 models
- Switched the Vision example from Llama 3.2-90b (retires 2026-09-30)
  to Llama 4 Maverick
- Added an info callout linking to OCI's retirement page so readers
  can verify dates themselves
---
 docs/providers/oci.md | 374 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 335 insertions(+), 39 deletions(-)

diff --git a/docs/providers/oci.md b/docs/providers/oci.md
index 182bb4407..1b5d8504d 100644
--- a/docs/providers/oci.md
+++ b/docs/providers/oci.md
@@ -8,54 +8,67 @@ Check the [OCI Models List](https://docs.oracle.com/en-us/iaas/Content/generativ
 
 ## Supported Models
 
+The list below tracks OCI's on-demand model catalog. For authoritative retirement dates and recommended replacements, see [OCI's on-demand model retirement page](https://docs.oracle.com/en-us/iaas/Content/generative-ai/deprecating-on-demand.htm).
+
+:::info
+OCI rotates models in and out of `ON_DEMAND` serving regularly. Models flagged below with a retirement date will continue to work in LiteLLM until OCI stops serving them — at which point requests will return a 404 from OCI. Plan migrations using the replacements OCI recommends on the retirement page.
+:::
+
 ### Chat / Text Generation
 
 #### Meta Llama Models
-- `meta.llama-4-maverick-17b-128e-instruct-fp8`
-- `meta.llama-4-scout-17b-16e-instruct`
+- `meta.llama-4-maverick-17b-128e-instruct-fp8` (multimodal)
+- `meta.llama-4-scout-17b-16e-instruct` (multimodal)
 - `meta.llama-3.3-70b-instruct`
 - `meta.llama-3.3-70b-instruct-fp8-dynamic`
-- `meta.llama-3.2-90b-vision-instruct`
+- `meta.llama-3.2-90b-vision-instruct` *(retires 2026-09-30 — replace with Llama 4)*
 - `meta.llama-3.2-11b-vision-instruct`
-- `meta.llama-3.1-405b-instruct`
-- `meta.llama-3.1-70b-instruct`
 
 #### xAI Grok Models
+- `xai.grok-4.3` *(latest)*
 - `xai.grok-4.20`
 - `xai.grok-4.20-multi-agent`
-- `xai.grok-4`
-- `xai.grok-4-fast`
-- `xai.grok-4.1-fast`
-- `xai.grok-3`
-- `xai.grok-3-fast`
-- `xai.grok-3-mini`
-- `xai.grok-3-mini-fast`
-- `xai.grok-code-fast-1`
+- `xai.grok-4` *(retires 2026-08-15 — replace with Grok 4.3)*
+- `xai.grok-4-fast` *(retires 2026-08-15 — replace with Grok 4.3)*
+- `xai.grok-4.1-fast` *(retires 2026-08-15 — replace with Grok 4.3)*
+- `xai.grok-3` *(retires 2026-08-15 — replace with Grok 4.3)*
+- `xai.grok-3-fast` *(retires 2026-08-15 — replace with Grok 4.3)*
+- `xai.grok-3-mini` *(retires 2026-08-15 — replace with Grok 4.3)*
+- `xai.grok-3-mini-fast` *(retires 2026-08-15 — replace with Grok 4.3)*
+- `xai.grok-code-fast-1` *(retires 2026-08-15 — replace with Grok 4.3)*
 
 #### Cohere Models
 - `cohere.command-latest`
 - `cohere.command-a-03-2025`
 - `cohere.command-a-reasoning-08-2025`
-- `cohere.command-a-vision-07-2025`
+- `cohere.command-a-vision-07-2025` (multimodal)
 - `cohere.command-a-translate-08-2025`
 - `cohere.command-plus-latest`
-- `cohere.command-r-08-2024`
-- `cohere.command-r-plus-08-2024`
+- `cohere.command-r-plus-08-2024` *(retires 2026-09-30 — replace with `cohere.command-a-03-2025`)*
+- `cohere.command-r-08-2024` *(retires 2026-09-30 — replace with `cohere.command-a-03-2025`)*
 
 #### Google Gemini Models (via OCI)
-- `google.gemini-2.5-pro`
-- `google.gemini-2.5-flash`
-- `google.gemini-2.5-flash-lite`
+- `google.gemini-2.5-pro` (multimodal)
+- `google.gemini-2.5-flash` (multimodal)
+- `google.gemini-2.5-flash-lite` (multimodal)
+
+#### OpenAI Open-Source Models (via OCI)
+- `openai.gpt-oss-120b`
+- `openai.gpt-oss-20b`
 
 ### Embedding Models
-- `cohere.embed-english-v3.0` (1024 dimensions)
-- `cohere.embed-english-light-v3.0` (384 dimensions)
-- `cohere.embed-multilingual-v3.0` (1024 dimensions)
-- `cohere.embed-multilingual-light-v3.0` (384 dimensions)
-- `cohere.embed-english-image-v3.0` (1024 dimensions, multimodal)
-- `cohere.embed-english-light-image-v3.0` (384 dimensions, multimodal)
-- `cohere.embed-multilingual-light-image-v3.0` (384 dimensions, multimodal)
-- `cohere.embed-v4.0` (1536 dimensions, multimodal)
+
+All `v3.0` embedding models retire **2026-09-30** — Oracle recommends migrating to `cohere.embed-v4.0`.
+
+- `cohere.embed-v4.0` (1536 dimensions, multimodal) — recommended
+- `cohere.embed-english-v3.0` (1024 dimensions) *(retires 2026-09-30)*
+- `cohere.embed-english-light-v3.0` (384 dimensions) *(retires 2026-09-30)*
+- `cohere.embed-multilingual-v3.0` (1024 dimensions) *(retires 2026-09-30)*
+- `cohere.embed-multilingual-light-v3.0` (384 dimensions) *(retires 2026-09-30)*
+- `cohere.embed-english-image-v3.0` (1024 dimensions, multimodal) *(retires 2026-09-30)*
+- `cohere.embed-english-light-image-v3.0` (384 dimensions, multimodal) *(retires 2026-09-30)*
+- `cohere.embed-multilingual-image-v3.0` (1024 dimensions, multimodal) *(retires 2026-09-30)*
+- `cohere.embed-multilingual-light-image-v3.0` (384 dimensions, multimodal) *(retires 2026-09-30)*
 
 ## Authentication
 
@@ -73,6 +86,21 @@ Provide individual OCI credentials directly to LiteLLM. Follow the [official Ora
 
 This is the default method for LiteLLM AI Gateway (LLM Proxy) access to OCI GenAI models.
 
+**Environment Variables**
+
+Instead of passing credentials in code, you can set the following environment variables — LiteLLM will read them automatically:
+
+```bash
+export OCI_REGION="us-chicago-1"
+export OCI_USER="ocid1.user.oc1.."
+export OCI_FINGERPRINT="xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx"
+export OCI_TENANCY="ocid1.tenancy.oc1.."
+export OCI_COMPARTMENT_ID="ocid1.compartment.oc1.."
+# Provide either the private key content OR the path to the key file:
+export OCI_KEY_FILE="/path/to/oci_api_key.pem"
+# export OCI_KEY="-----BEGIN PRIVATE KEY-----\n..."
+```
+
 ### Method 2: OCI SDK Signer
 Use an OCI SDK `Signer` object for authentication. This method:
 - Leverages the official [OCI SDK for signing](https://docs.oracle.com/en-us/iaas/tools/python/latest/api/signing.html)
@@ -220,6 +248,92 @@ print(response)
 </TabItem>
 </Tabs>
 
+## LiteLLM Proxy Usage
+
+Here's how to call OCI GenAI through the LiteLLM Proxy Server.
+
+### 1. Setup config.yaml
+
+```yaml
+model_list:
+  - model_name: oci-grok-4
+    litellm_params:
+      model: oci/xai.grok-4
+      oci_region: os.environ/OCI_REGION
+      oci_user: os.environ/OCI_USER
+      oci_fingerprint: os.environ/OCI_FINGERPRINT
+      oci_tenancy: os.environ/OCI_TENANCY
+      oci_key_file: os.environ/OCI_KEY_FILE
+      oci_compartment_id: os.environ/OCI_COMPARTMENT_ID
+
+  - model_name: oci-cohere-command
+    litellm_params:
+      model: oci/cohere.command-latest
+      oci_region: os.environ/OCI_REGION
+      oci_user: os.environ/OCI_USER
+      oci_fingerprint: os.environ/OCI_FINGERPRINT
+      oci_tenancy: os.environ/OCI_TENANCY
+      oci_key_file: os.environ/OCI_KEY_FILE
+      oci_compartment_id: os.environ/OCI_COMPARTMENT_ID
+```
+
+All possible auth params:
+
+```
+oci_region: Optional[str],
+oci_user: Optional[str],
+oci_fingerprint: Optional[str],
+oci_tenancy: Optional[str],
+oci_key: Optional[str],          # private key content as string
+oci_key_file: Optional[str],     # path to .pem file
+oci_compartment_id: Optional[str],
+oci_serving_mode: Optional[str], # "ON_DEMAND" (default) or "DEDICATED"
+oci_endpoint_id: Optional[str],  # only used with DEDICATED
+```
+
+### 2. Start the proxy
+
+```bash
+litellm --config /path/to/config.yaml
+```
+
+### 3. Test it
+
+<Tabs>
+<TabItem value="Curl" label="Curl Request">
+
+```shell
+curl --location 'http://0.0.0.0:4000/chat/completions' \
+--header 'Content-Type: application/json' \
+--data '{
+  "model": "oci-grok-4",
+  "messages": [
+    {"role": "user", "content": "what llm are you"}
+  ]
+}'
+```
+
+</TabItem>
+<TabItem value="openai" label="OpenAI v1.0.0+">
+
+```python
+import openai
+
+client = openai.OpenAI(
+    api_key="anything",
+    base_url="http://0.0.0.0:4000"
+)
+
+response = client.chat.completions.create(
+    model="oci-grok-4",
+    messages=[{"role": "user", "content": "write a short poem"}],
+)
+print(response)
+```
+
+</TabItem>
+</Tabs>
+
 ## Usage - Streaming
 Just set `stream=True` when calling completion.
 
@@ -411,20 +525,202 @@ response = completion(
 )
 ```
 
+## Usage - Function Calling / Tool Calling
+
+OCI GenAI supports OpenAI-compatible function calling. LiteLLM normalizes the request and response shape so the same code that targets OpenAI works with OCI Cohere and Generic (xAI Grok, Meta Llama, Google Gemini) models.
+
+<Tabs>
+<TabItem value="tool-sdk" label="SDK">
+
+```python
+from litellm import completion
+
+tools = [
+    {
+        "type": "function",
+        "function": {
+            "name": "get_current_weather",
+            "description": "Get the current weather in a given location",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "location": {
+                        "type": "string",
+                        "description": "The city and state, e.g. San Francisco, CA",
+                    },
+                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
+                },
+                "required": ["location"],
+            },
+        },
+    }
+]
+
+response = completion(
+    model="oci/xai.grok-4",
+    messages=[{"role": "user", "content": "What's the weather in Boston today?"}],
+    tools=tools,
+    tool_choice="auto",
+    oci_region="us-chicago-1",
+    oci_user="<your_oci_user>",
+    oci_fingerprint="<your_oci_fingerprint>",
+    oci_tenancy="<your_oci_tenancy>",
+    oci_key_file="<path/to/oci_key.pem>",
+    oci_compartment_id="<oci_compartment_id>",
+)
+
+# Inspect the tool call
+print(response.choices[0].message.tool_calls)
+```
+
+</TabItem>
+<TabItem value="tool-proxy" label="PROXY">
+
+```python
+import openai
+
+client = openai.OpenAI(api_key="anything", base_url="http://0.0.0.0:4000")
+
+response = client.chat.completions.create(
+    model="oci-grok-4",
+    messages=[{"role": "user", "content": "What's the weather in Boston today?"}],
+    tools=[
+        {
+            "type": "function",
+            "function": {
+                "name": "get_current_weather",
+                "description": "Get the current weather in a given location",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "location": {"type": "string"},
+                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
+                    },
+                    "required": ["location"],
+                },
+            },
+        }
+    ],
+    tool_choice="auto",
+)
+print(response.choices[0].message.tool_calls)
+```
+
+</TabItem>
+</Tabs>
+
+Tool calling works with both Cohere (`cohere.command-*`) and Generic (`xai.grok-*`, `meta.llama-*`, `google.gemini-*`) model families — LiteLLM adapts the OpenAI tool schema to each vendor's native format internally.
+
+## Usage - Vision / Multimodal
+
+OCI GenAI exposes vision-capable models that accept images alongside text. Pass images using the standard OpenAI `image_url` content block.
+
+```python
+from litellm import completion
+
+response = completion(
+    model="oci/meta.llama-4-maverick-17b-128e-instruct-fp8",
+    messages=[
+        {
+            "role": "user",
+            "content": [
+                {"type": "text", "text": "What is in this image?"},
+                {
+                    "type": "image_url",
+                    "image_url": {
+                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
+                    },
+                },
+            ],
+        }
+    ],
+    oci_region="us-chicago-1",
+    oci_user="<your_oci_user>",
+    oci_fingerprint="<your_oci_fingerprint>",
+    oci_tenancy="<your_oci_tenancy>",
+    oci_key_file="<path/to/oci_key.pem>",
+    oci_compartment_id="<oci_compartment_id>",
+)
+print(response.choices[0].message.content)
+```
+
+Vision-capable models on OCI include:
+
+- `meta.llama-4-maverick-17b-128e-instruct-fp8`
+- `meta.llama-4-scout-17b-16e-instruct`
+- `meta.llama-3.2-11b-vision-instruct`
+- `meta.llama-3.2-90b-vision-instruct` *(retires 2026-09-30)*
+- `cohere.command-a-vision-07-2025`
+- `google.gemini-2.5-pro`, `google.gemini-2.5-flash`, `google.gemini-2.5-flash-lite`
+
+Both URL and base64-encoded data URIs are supported.
+
+## Usage - Reasoning / Thinking
+
+OCI Generic-vendor models (xAI Grok reasoning variants, Google Gemini, etc.) support a reasoning step. LiteLLM exposes this via the OpenAI-compatible `reasoning_effort` parameter — accepted values are `"low"`, `"medium"`, `"high"`, and `"disable"` (mapped to OCI's `NONE`).
+
+Returned reasoning tokens are surfaced on `usage.completion_tokens_details.reasoning_tokens`, matching the OpenAI shape.
+
+<Tabs>
+<TabItem value="reasoning-sdk" label="SDK">
+
+```python
+from litellm import completion
+
+response = completion(
+    model="oci/xai.grok-3-mini",
+    messages=[{"role": "user", "content": "If 3x + 7 = 22, what is x? Show your reasoning."}],
+    reasoning_effort="high",  # "low" | "medium" | "high" | "disable"
+    oci_region="us-chicago-1",
+    oci_user="<your_oci_user>",
+    oci_fingerprint="<your_oci_fingerprint>",
+    oci_tenancy="<your_oci_tenancy>",
+    oci_key_file="<path/to/oci_key.pem>",
+    oci_compartment_id="<oci_compartment_id>",
+)
+
+print(response.choices[0].message.content)
+print("Reasoning tokens:", response.usage.completion_tokens_details.reasoning_tokens)
+```
+
+</TabItem>
+<TabItem value="reasoning-proxy" label="PROXY">
+
+```python
+import openai
+
+client = openai.OpenAI(api_key="anything", base_url="http://0.0.0.0:4000")
+
+response = client.chat.completions.create(
+    model="oci-grok-mini",
+    messages=[{"role": "user", "content": "If 3x + 7 = 22, what is x?"}],
+    reasoning_effort="high",
+)
+print(response.choices[0].message.content)
+```
+
+</TabItem>
+</Tabs>
+
+:::note
+`reasoning_effort` is only honored on Generic-vendor reasoning models (e.g., `xai.grok-3-mini`, `xai.grok-4`, `google.gemini-2.5-pro`). It is silently ignored for OCI Cohere models, which are not reasoning models.
+:::
+
 ## Optional Parameters
 
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `oci_region` | string | `us-ashburn-1` | OCI region where the GenAI service is deployed |
-| `oci_serving_mode` | string | `ON_DEMAND` | Service mode: `ON_DEMAND` for managed models or `DEDICATED` for dedicated endpoints |
-| `oci_endpoint_id` | string | Same as `model` | (For DEDICATED mode) The OCID of your dedicated endpoint |
-| `oci_compartment_id` | string | **Required** | The OCID of the OCI compartment containing your resources |
-| `oci_user` | string | - | (Manual auth) The OCID of the OCI user |
-| `oci_fingerprint` | string | - | (Manual auth) The fingerprint of the API signing key |
-| `oci_tenancy` | string | - | (Manual auth) The OCID of your OCI tenancy |
-| `oci_key` | string | - | (Manual auth) The private key content as a string |
-| `oci_key_file` | string | - | (Manual auth) Path to the private key file |
-| `oci_signer` | object | - | (SDK auth) OCI SDK Signer object for authentication |
+| Parameter | Type | Default | Environment Variable | Description |
+|-----------|------|---------|----------------------|-------------|
+| `oci_region` | string | `us-ashburn-1` | `OCI_REGION` | OCI region where the GenAI service is deployed |
+| `oci_serving_mode` | string | `ON_DEMAND` | – | Service mode: `ON_DEMAND` for managed models or `DEDICATED` for dedicated endpoints |
+| `oci_endpoint_id` | string | Same as `model` | – | (For DEDICATED mode) The OCID of your dedicated endpoint |
+| `oci_compartment_id` | string | **Required** | `OCI_COMPARTMENT_ID` | The OCID of the OCI compartment containing your resources |
+| `oci_user` | string | – | `OCI_USER` | (Manual auth) The OCID of the OCI user |
+| `oci_fingerprint` | string | – | `OCI_FINGERPRINT` | (Manual auth) The fingerprint of the API signing key |
+| `oci_tenancy` | string | – | `OCI_TENANCY` | (Manual auth) The OCID of your OCI tenancy |
+| `oci_key` | string | – | `OCI_KEY` | (Manual auth) The private key content as a string |
+| `oci_key_file` | string | – | `OCI_KEY_FILE` | (Manual auth) Path to the private key file |
+| `oci_signer` | object | – | – | (SDK auth) OCI SDK Signer object for authentication |
+| `reasoning_effort` | string | – | – | Reasoning level for Generic-vendor reasoning models: `low`, `medium`, `high`, `disable` |
 
 ## Embeddings
 

From 127c7b543c18052aa207c5921936624c2b29e2d5 Mon Sep 17 00:00:00 2001
From: Federico Kamelhar <federico.kamelhar@oracle.com>
Date: Fri, 22 May 2026 03:20:11 -0400
Subject: [PATCH 2/2] docs(oci): align deprecation handling with bedrock/openai
 style
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Other LiteLLM provider pages (bedrock, openai, anthropic) don't track
retirement dates inline — they list active models and let the
provider's own lifecycle page own the dates. Drop the per-model
*(retires ...)* annotations to match. The single info-callout link to
OCI's retirement page stays so readers can verify the schedule.
---
 docs/providers/oci.md | 55 +++++++++++++++++++------------------------
 1 file changed, 24 insertions(+), 31 deletions(-)

diff --git a/docs/providers/oci.md b/docs/providers/oci.md
index 1b5d8504d..5e0277a08 100644
--- a/docs/providers/oci.md
+++ b/docs/providers/oci.md
@@ -8,11 +8,7 @@ Check the [OCI Models List](https://docs.oracle.com/en-us/iaas/Content/generativ
 
 ## Supported Models
 
-The list below tracks OCI's on-demand model catalog. For authoritative retirement dates and recommended replacements, see [OCI's on-demand model retirement page](https://docs.oracle.com/en-us/iaas/Content/generative-ai/deprecating-on-demand.htm).
-
-:::info
-OCI rotates models in and out of `ON_DEMAND` serving regularly. Models flagged below with a retirement date will continue to work in LiteLLM until OCI stops serving them — at which point requests will return a 404 from OCI. Plan migrations using the replacements OCI recommends on the retirement page.
-:::
+For model lifecycle, retirement dates, and recommended replacements, see [OCI's on-demand model retirement page](https://docs.oracle.com/en-us/iaas/Content/generative-ai/deprecating-on-demand.htm) — Oracle is the authoritative source.
 
 ### Chat / Text Generation
 
@@ -21,21 +17,21 @@ OCI rotates models in and out of `ON_DEMAND` serving regularly. Models flagged b
 - `meta.llama-4-scout-17b-16e-instruct` (multimodal)
 - `meta.llama-3.3-70b-instruct`
 - `meta.llama-3.3-70b-instruct-fp8-dynamic`
-- `meta.llama-3.2-90b-vision-instruct` *(retires 2026-09-30 — replace with Llama 4)*
-- `meta.llama-3.2-11b-vision-instruct`
+- `meta.llama-3.2-90b-vision-instruct` (multimodal)
+- `meta.llama-3.2-11b-vision-instruct` (multimodal)
 
 #### xAI Grok Models
-- `xai.grok-4.3` *(latest)*
+- `xai.grok-4.3`
 - `xai.grok-4.20`
 - `xai.grok-4.20-multi-agent`
-- `xai.grok-4` *(retires 2026-08-15 — replace with Grok 4.3)*
-- `xai.grok-4-fast` *(retires 2026-08-15 — replace with Grok 4.3)*
-- `xai.grok-4.1-fast` *(retires 2026-08-15 — replace with Grok 4.3)*
-- `xai.grok-3` *(retires 2026-08-15 — replace with Grok 4.3)*
-- `xai.grok-3-fast` *(retires 2026-08-15 — replace with Grok 4.3)*
-- `xai.grok-3-mini` *(retires 2026-08-15 — replace with Grok 4.3)*
-- `xai.grok-3-mini-fast` *(retires 2026-08-15 — replace with Grok 4.3)*
-- `xai.grok-code-fast-1` *(retires 2026-08-15 — replace with Grok 4.3)*
+- `xai.grok-4`
+- `xai.grok-4-fast`
+- `xai.grok-4.1-fast`
+- `xai.grok-3`
+- `xai.grok-3-fast`
+- `xai.grok-3-mini`
+- `xai.grok-3-mini-fast`
+- `xai.grok-code-fast-1`
 
 #### Cohere Models
 - `cohere.command-latest`
@@ -44,8 +40,8 @@ OCI rotates models in and out of `ON_DEMAND` serving regularly. Models flagged b
 - `cohere.command-a-vision-07-2025` (multimodal)
 - `cohere.command-a-translate-08-2025`
 - `cohere.command-plus-latest`
-- `cohere.command-r-plus-08-2024` *(retires 2026-09-30 — replace with `cohere.command-a-03-2025`)*
-- `cohere.command-r-08-2024` *(retires 2026-09-30 — replace with `cohere.command-a-03-2025`)*
+- `cohere.command-r-plus-08-2024`
+- `cohere.command-r-08-2024`
 
 #### Google Gemini Models (via OCI)
 - `google.gemini-2.5-pro` (multimodal)
@@ -57,18 +53,15 @@ OCI rotates models in and out of `ON_DEMAND` serving regularly. Models flagged b
 - `openai.gpt-oss-20b`
 
 ### Embedding Models
-
-All `v3.0` embedding models retire **2026-09-30** — Oracle recommends migrating to `cohere.embed-v4.0`.
-
-- `cohere.embed-v4.0` (1536 dimensions, multimodal) — recommended
-- `cohere.embed-english-v3.0` (1024 dimensions) *(retires 2026-09-30)*
-- `cohere.embed-english-light-v3.0` (384 dimensions) *(retires 2026-09-30)*
-- `cohere.embed-multilingual-v3.0` (1024 dimensions) *(retires 2026-09-30)*
-- `cohere.embed-multilingual-light-v3.0` (384 dimensions) *(retires 2026-09-30)*
-- `cohere.embed-english-image-v3.0` (1024 dimensions, multimodal) *(retires 2026-09-30)*
-- `cohere.embed-english-light-image-v3.0` (384 dimensions, multimodal) *(retires 2026-09-30)*
-- `cohere.embed-multilingual-image-v3.0` (1024 dimensions, multimodal) *(retires 2026-09-30)*
-- `cohere.embed-multilingual-light-image-v3.0` (384 dimensions, multimodal) *(retires 2026-09-30)*
+- `cohere.embed-v4.0` (1536 dimensions, multimodal)
+- `cohere.embed-english-v3.0` (1024 dimensions)
+- `cohere.embed-english-light-v3.0` (384 dimensions)
+- `cohere.embed-multilingual-v3.0` (1024 dimensions)
+- `cohere.embed-multilingual-light-v3.0` (384 dimensions)
+- `cohere.embed-english-image-v3.0` (1024 dimensions, multimodal)
+- `cohere.embed-english-light-image-v3.0` (384 dimensions, multimodal)
+- `cohere.embed-multilingual-image-v3.0` (1024 dimensions, multimodal)
+- `cohere.embed-multilingual-light-image-v3.0` (384 dimensions, multimodal)
 
 ## Authentication
 
@@ -649,7 +642,7 @@ Vision-capable models on OCI include:
 - `meta.llama-4-maverick-17b-128e-instruct-fp8`
 - `meta.llama-4-scout-17b-16e-instruct`
 - `meta.llama-3.2-11b-vision-instruct`
-- `meta.llama-3.2-90b-vision-instruct` *(retires 2026-09-30)*
+- `meta.llama-3.2-90b-vision-instruct`
 - `cohere.command-a-vision-07-2025`
 - `google.gemini-2.5-pro`, `google.gemini-2.5-flash`, `google.gemini-2.5-flash-lite`