BerriAI · mateo-berri · May 23, 2026 · May 22, 2026 · May 22, 2026
diff --git a/docs/providers/oci.md b/docs/providers/oci.md
@@ -8,19 +8,20 @@ Check the [OCI Models List](https://docs.oracle.com/en-us/iaas/Content/generativ
 
 ## Supported Models
 
+For model lifecycle, retirement dates, and recommended replacements, see [OCI's on-demand model retirement page](https://docs.oracle.com/en-us/iaas/Content/generative-ai/deprecating-on-demand.htm) — Oracle is the authoritative source.
+
 ### Chat / Text Generation
 
 #### Meta Llama Models
-- `meta.llama-4-maverick-17b-128e-instruct-fp8`
-- `meta.llama-4-scout-17b-16e-instruct`
+- `meta.llama-4-maverick-17b-128e-instruct-fp8` (multimodal)
+- `meta.llama-4-scout-17b-16e-instruct` (multimodal)
 - `meta.llama-3.3-70b-instruct`
 - `meta.llama-3.3-70b-instruct-fp8-dynamic`
-- `meta.llama-3.2-90b-vision-instruct`
-- `meta.llama-3.2-11b-vision-instruct`
-- `meta.llama-3.1-405b-instruct`
-- `meta.llama-3.1-70b-instruct`
+- `meta.llama-3.2-90b-vision-instruct` (multimodal)
+- `meta.llama-3.2-11b-vision-instruct` (multimodal)
 
 #### xAI Grok Models
+- `xai.grok-4.3`
 - `xai.grok-4.20`
 - `xai.grok-4.20-multi-agent`
 - `xai.grok-4`
@@ -36,26 +37,31 @@ Check the [OCI Models List](https://docs.oracle.com/en-us/iaas/Content/generativ
 - `cohere.command-latest`
 - `cohere.command-a-03-2025`
 - `cohere.command-a-reasoning-08-2025`
-- `cohere.command-a-vision-07-2025`
+- `cohere.command-a-vision-07-2025` (multimodal)
 - `cohere.command-a-translate-08-2025`
 - `cohere.command-plus-latest`
-- `cohere.command-r-08-2024`
 - `cohere.command-r-plus-08-2024`
+- `cohere.command-r-08-2024`
 
 #### Google Gemini Models (via OCI)
-- `google.gemini-2.5-pro`
-- `google.gemini-2.5-flash`
-- `google.gemini-2.5-flash-lite`
+- `google.gemini-2.5-pro` (multimodal)
+- `google.gemini-2.5-flash` (multimodal)
+- `google.gemini-2.5-flash-lite` (multimodal)
+
+#### OpenAI Open-Source Models (via OCI)
+- `openai.gpt-oss-120b`
+- `openai.gpt-oss-20b`
 
 ### Embedding Models
+- `cohere.embed-v4.0` (1536 dimensions, multimodal)
 - `cohere.embed-english-v3.0` (1024 dimensions)
 - `cohere.embed-english-light-v3.0` (384 dimensions)
 - `cohere.embed-multilingual-v3.0` (1024 dimensions)
 - `cohere.embed-multilingual-light-v3.0` (384 dimensions)
 - `cohere.embed-english-image-v3.0` (1024 dimensions, multimodal)
 - `cohere.embed-english-light-image-v3.0` (384 dimensions, multimodal)
+- `cohere.embed-multilingual-image-v3.0` (1024 dimensions, multimodal)
 - `cohere.embed-multilingual-light-image-v3.0` (384 dimensions, multimodal)
-- `cohere.embed-v4.0` (1536 dimensions, multimodal)
 
 ## Authentication
 
@@ -73,6 +79,21 @@ Provide individual OCI credentials directly to LiteLLM. Follow the [official Ora
 
 This is the default method for LiteLLM AI Gateway (LLM Proxy) access to OCI GenAI models.
 
+**Environment Variables**
+
+Instead of passing credentials in code, you can set the following environment variables — LiteLLM will read them automatically:
+
+```bash
+export OCI_REGION="us-chicago-1"
+export OCI_USER="ocid1.user.oc1.."
+export OCI_FINGERPRINT="xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx"
+export OCI_TENANCY="ocid1.tenancy.oc1.."
+export OCI_COMPARTMENT_ID="ocid1.compartment.oc1.."
+# Provide either the private key content OR the path to the key file:
+export OCI_KEY_FILE="/path/to/oci_api_key.pem"
+# export OCI_KEY="-----BEGIN PRIVATE KEY-----\n..."
+```
+
 ### Method 2: OCI SDK Signer
 Use an OCI SDK `Signer` object for authentication. This method:
 - Leverages the official [OCI SDK for signing](https://docs.oracle.com/en-us/iaas/tools/python/latest/api/signing.html)
@@ -220,6 +241,92 @@ print(response)
 </TabItem>
 </Tabs>
 
+## LiteLLM Proxy Usage
+
+Here's how to call OCI GenAI through the LiteLLM Proxy Server.
+
+### 1. Setup config.yaml
+
+```yaml
+model_list:
+  - model_name: oci-grok-4
+    litellm_params:
+      model: oci/xai.grok-4
+      oci_region: os.environ/OCI_REGION
+      oci_user: os.environ/OCI_USER
+      oci_fingerprint: os.environ/OCI_FINGERPRINT
+      oci_tenancy: os.environ/OCI_TENANCY
+      oci_key_file: os.environ/OCI_KEY_FILE
+      oci_compartment_id: os.environ/OCI_COMPARTMENT_ID
+
+  - model_name: oci-cohere-command
+    litellm_params:
+      model: oci/cohere.command-latest
+      oci_region: os.environ/OCI_REGION
+      oci_user: os.environ/OCI_USER
+      oci_fingerprint: os.environ/OCI_FINGERPRINT
+      oci_tenancy: os.environ/OCI_TENANCY
+      oci_key_file: os.environ/OCI_KEY_FILE
+      oci_compartment_id: os.environ/OCI_COMPARTMENT_ID
+```
+
+All possible auth params:
+
+```
+oci_region: Optional[str],
+oci_user: Optional[str],
+oci_fingerprint: Optional[str],
+oci_tenancy: Optional[str],
+oci_key: Optional[str],          # private key content as string
+oci_key_file: Optional[str],     # path to .pem file
+oci_compartment_id: Optional[str],
+oci_serving_mode: Optional[str], # "ON_DEMAND" (default) or "DEDICATED"
+oci_endpoint_id: Optional[str],  # only used with DEDICATED
+```
+
+### 2. Start the proxy
+
+```bash
+litellm --config /path/to/config.yaml
+```
+
+### 3. Test it
+
+<Tabs>
+<TabItem value="Curl" label="Curl Request">
+
+```shell
+curl --location 'http://0.0.0.0:4000/chat/completions' \
+--header 'Content-Type: application/json' \
+--data '{
+  "model": "oci-grok-4",
+  "messages": [
+    {"role": "user", "content": "what llm are you"}
+  ]
+}'
+```
+
+</TabItem>
+<TabItem value="openai" label="OpenAI v1.0.0+">
+
+```python
+import openai
+
+client = openai.OpenAI(
+    api_key="anything",
+    base_url="http://0.0.0.0:4000"
+)
+
+response = client.chat.completions.create(
+    model="oci-grok-4",
+    messages=[{"role": "user", "content": "write a short poem"}],
+)
+print(response)
+```
+
+</TabItem>
+</Tabs>
+
 ## Usage - Streaming
 Just set `stream=True` when calling completion.
 
@@ -411,20 +518,202 @@ response = completion(
 )
 ```
 
+## Usage - Function Calling / Tool Calling
+
+OCI GenAI supports OpenAI-compatible function calling. LiteLLM normalizes the request and response shape so the same code that targets OpenAI works with OCI Cohere and Generic (xAI Grok, Meta Llama, Google Gemini) models.
+
+<Tabs>
+<TabItem value="tool-sdk" label="SDK">
+
+```python
+from litellm import completion
+
+tools = [
+    {
+        "type": "function",
+        "function": {
+            "name": "get_current_weather",
+            "description": "Get the current weather in a given location",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "location": {
+                        "type": "string",
+                        "description": "The city and state, e.g. San Francisco, CA",
+                    },
+                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
+                },
+                "required": ["location"],
+            },
+        },
+    }
+]
+
+response = completion(
+    model="oci/xai.grok-4",
+    messages=[{"role": "user", "content": "What's the weather in Boston today?"}],
+    tools=tools,
+    tool_choice="auto",
+    oci_region="us-chicago-1",
+    oci_user="<your_oci_user>",
+    oci_fingerprint="<your_oci_fingerprint>",
+    oci_tenancy="<your_oci_tenancy>",
+    oci_key_file="<path/to/oci_key.pem>",
+    oci_compartment_id="<oci_compartment_id>",
+)
+
+# Inspect the tool call
+print(response.choices[0].message.tool_calls)
+```
+
+</TabItem>
+<TabItem value="tool-proxy" label="PROXY">
+
+```python
+import openai
+
+client = openai.OpenAI(api_key="anything", base_url="http://0.0.0.0:4000")
+
+response = client.chat.completions.create(
+    model="oci-grok-4",
+    messages=[{"role": "user", "content": "What's the weather in Boston today?"}],
+    tools=[
+        {
+            "type": "function",
+            "function": {
+                "name": "get_current_weather",
+                "description": "Get the current weather in a given location",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "location": {"type": "string"},
+                        "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
+                    },
+                    "required": ["location"],
+                },
+            },
+        }
+    ],
+    tool_choice="auto",
+)
+print(response.choices[0].message.tool_calls)
+```
+
+</TabItem>
+</Tabs>
+
+Tool calling works with both Cohere (`cohere.command-*`) and Generic (`xai.grok-*`, `meta.llama-*`, `google.gemini-*`) model families — LiteLLM adapts the OpenAI tool schema to each vendor's native format internally.
+
+## Usage - Vision / Multimodal
+
+OCI GenAI exposes vision-capable models that accept images alongside text. Pass images using the standard OpenAI `image_url` content block.
+
+```python
+from litellm import completion
+
+response = completion(
+    model="oci/meta.llama-4-maverick-17b-128e-instruct-fp8",
+    messages=[
+        {
+            "role": "user",
+            "content": [
+                {"type": "text", "text": "What is in this image?"},
+                {
+                    "type": "image_url",
+                    "image_url": {
+                        "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
+                    },
+                },
+            ],
+        }
+    ],
+    oci_region="us-chicago-1",
+    oci_user="<your_oci_user>",
+    oci_fingerprint="<your_oci_fingerprint>",
+    oci_tenancy="<your_oci_tenancy>",
+    oci_key_file="<path/to/oci_key.pem>",
+    oci_compartment_id="<oci_compartment_id>",
+)
+print(response.choices[0].message.content)
+```
+
+Vision-capable models on OCI include:
+
+- `meta.llama-4-maverick-17b-128e-instruct-fp8`
+- `meta.llama-4-scout-17b-16e-instruct`
+- `meta.llama-3.2-11b-vision-instruct`
+- `meta.llama-3.2-90b-vision-instruct`
+- `cohere.command-a-vision-07-2025`
+- `google.gemini-2.5-pro`, `google.gemini-2.5-flash`, `google.gemini-2.5-flash-lite`
+
+Both URL and base64-encoded data URIs are supported.
+
+## Usage - Reasoning / Thinking
+
+OCI Generic-vendor models (xAI Grok reasoning variants, Google Gemini, etc.) support a reasoning step. LiteLLM exposes this via the OpenAI-compatible `reasoning_effort` parameter — accepted values are `"low"`, `"medium"`, `"high"`, and `"disable"` (mapped to OCI's `NONE`).
+
+Returned reasoning tokens are surfaced on `usage.completion_tokens_details.reasoning_tokens`, matching the OpenAI shape.
+
+<Tabs>
+<TabItem value="reasoning-sdk" label="SDK">
+
+```python
+from litellm import completion
+
+response = completion(
+    model="oci/xai.grok-3-mini",
+    messages=[{"role": "user", "content": "If 3x + 7 = 22, what is x? Show your reasoning."}],
+    reasoning_effort="high",  # "low" | "medium" | "high" | "disable"
+    oci_region="us-chicago-1",
+    oci_user="<your_oci_user>",
+    oci_fingerprint="<your_oci_fingerprint>",
+    oci_tenancy="<your_oci_tenancy>",
+    oci_key_file="<path/to/oci_key.pem>",
+    oci_compartment_id="<oci_compartment_id>",
+)
+
+print(response.choices[0].message.content)
+print("Reasoning tokens:", response.usage.completion_tokens_details.reasoning_tokens)
+```
+
+</TabItem>
+<TabItem value="reasoning-proxy" label="PROXY">
+
+```python
+import openai
+
+client = openai.OpenAI(api_key="anything", base_url="http://0.0.0.0:4000")
+
+response = client.chat.completions.create(
+    model="oci-grok-mini",
+    messages=[{"role": "user", "content": "If 3x + 7 = 22, what is x?"}],
+    reasoning_effort="high",
+)
+print(response.choices[0].message.content)
+```
+
+</TabItem>
+</Tabs>
+
+:::note
+`reasoning_effort` is only honored on Generic-vendor reasoning models (e.g., `xai.grok-3-mini`, `xai.grok-4`, `google.gemini-2.5-pro`). It is silently ignored for OCI Cohere models, which are not reasoning models.
+:::
+
 ## Optional Parameters
 
-| Parameter | Type | Default | Description |
-|-----------|------|---------|-------------|
-| `oci_region` | string | `us-ashburn-1` | OCI region where the GenAI service is deployed |
-| `oci_serving_mode` | string | `ON_DEMAND` | Service mode: `ON_DEMAND` for managed models or `DEDICATED` for dedicated endpoints |
-| `oci_endpoint_id` | string | Same as `model` | (For DEDICATED mode) The OCID of your dedicated endpoint |
-| `oci_compartment_id` | string | **Required** | The OCID of the OCI compartment containing your resources |
-| `oci_user` | string | - | (Manual auth) The OCID of the OCI user |
-| `oci_fingerprint` | string | - | (Manual auth) The fingerprint of the API signing key |
-| `oci_tenancy` | string | - | (Manual auth) The OCID of your OCI tenancy |
-| `oci_key` | string | - | (Manual auth) The private key content as a string |
-| `oci_key_file` | string | - | (Manual auth) Path to the private key file |
-| `oci_signer` | object | - | (SDK auth) OCI SDK Signer object for authentication |
+| Parameter | Type | Default | Environment Variable | Description |
+|-----------|------|---------|----------------------|-------------|
+| `oci_region` | string | `us-ashburn-1` | `OCI_REGION` | OCI region where the GenAI service is deployed |
+| `oci_serving_mode` | string | `ON_DEMAND` | – | Service mode: `ON_DEMAND` for managed models or `DEDICATED` for dedicated endpoints |
+| `oci_endpoint_id` | string | Same as `model` | – | (For DEDICATED mode) The OCID of your dedicated endpoint |
+| `oci_compartment_id` | string | **Required** | `OCI_COMPARTMENT_ID` | The OCID of the OCI compartment containing your resources |
+| `oci_user` | string | – | `OCI_USER` | (Manual auth) The OCID of the OCI user |
+| `oci_fingerprint` | string | – | `OCI_FINGERPRINT` | (Manual auth) The fingerprint of the API signing key |
+| `oci_tenancy` | string | – | `OCI_TENANCY` | (Manual auth) The OCID of your OCI tenancy |
+| `oci_key` | string | – | `OCI_KEY` | (Manual auth) The private key content as a string |
+| `oci_key_file` | string | – | `OCI_KEY_FILE` | (Manual auth) Path to the private key file |
+| `oci_signer` | object | – | – | (SDK auth) OCI SDK Signer object for authentication |
+| `reasoning_effort` | string | – | – | Reasoning level for Generic-vendor reasoning models: `low`, `medium`, `high`, `disable` |
 
 ## Embeddings