Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
337 changes: 313 additions & 24 deletions docs/providers/oci.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,20 @@ Check the [OCI Models List](https://docs.oracle.com/en-us/iaas/Content/generativ

## Supported Models

For model lifecycle, retirement dates, and recommended replacements, see [OCI's on-demand model retirement page](https://docs.oracle.com/en-us/iaas/Content/generative-ai/deprecating-on-demand.htm) — Oracle is the authoritative source.

### Chat / Text Generation

#### Meta Llama Models
- `meta.llama-4-maverick-17b-128e-instruct-fp8`
- `meta.llama-4-scout-17b-16e-instruct`
- `meta.llama-4-maverick-17b-128e-instruct-fp8` (multimodal)
- `meta.llama-4-scout-17b-16e-instruct` (multimodal)
- `meta.llama-3.3-70b-instruct`
- `meta.llama-3.3-70b-instruct-fp8-dynamic`
- `meta.llama-3.2-90b-vision-instruct`
- `meta.llama-3.2-11b-vision-instruct`
- `meta.llama-3.1-405b-instruct`
- `meta.llama-3.1-70b-instruct`
- `meta.llama-3.2-90b-vision-instruct` (multimodal)
- `meta.llama-3.2-11b-vision-instruct` (multimodal)

#### xAI Grok Models
- `xai.grok-4.3`
- `xai.grok-4.20`
- `xai.grok-4.20-multi-agent`
- `xai.grok-4`
Expand All @@ -36,26 +37,31 @@ Check the [OCI Models List](https://docs.oracle.com/en-us/iaas/Content/generativ
- `cohere.command-latest`
- `cohere.command-a-03-2025`
- `cohere.command-a-reasoning-08-2025`
- `cohere.command-a-vision-07-2025`
- `cohere.command-a-vision-07-2025` (multimodal)
- `cohere.command-a-translate-08-2025`
- `cohere.command-plus-latest`
- `cohere.command-r-08-2024`
- `cohere.command-r-plus-08-2024`
- `cohere.command-r-08-2024`

#### Google Gemini Models (via OCI)
- `google.gemini-2.5-pro`
- `google.gemini-2.5-flash`
- `google.gemini-2.5-flash-lite`
- `google.gemini-2.5-pro` (multimodal)
- `google.gemini-2.5-flash` (multimodal)
- `google.gemini-2.5-flash-lite` (multimodal)

#### OpenAI Open-Source Models (via OCI)
- `openai.gpt-oss-120b`
- `openai.gpt-oss-20b`

### Embedding Models
- `cohere.embed-v4.0` (1536 dimensions, multimodal)
- `cohere.embed-english-v3.0` (1024 dimensions)
- `cohere.embed-english-light-v3.0` (384 dimensions)
- `cohere.embed-multilingual-v3.0` (1024 dimensions)
- `cohere.embed-multilingual-light-v3.0` (384 dimensions)
- `cohere.embed-english-image-v3.0` (1024 dimensions, multimodal)
- `cohere.embed-english-light-image-v3.0` (384 dimensions, multimodal)
- `cohere.embed-multilingual-image-v3.0` (1024 dimensions, multimodal)
- `cohere.embed-multilingual-light-image-v3.0` (384 dimensions, multimodal)
- `cohere.embed-v4.0` (1536 dimensions, multimodal)

## Authentication

Expand All @@ -73,6 +79,21 @@ Provide individual OCI credentials directly to LiteLLM. Follow the [official Ora

This is the default method for LiteLLM AI Gateway (LLM Proxy) access to OCI GenAI models.

**Environment Variables**

Instead of passing credentials in code, you can set the following environment variables — LiteLLM will read them automatically:

```bash
export OCI_REGION="us-chicago-1"
export OCI_USER="ocid1.user.oc1.."
export OCI_FINGERPRINT="xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx"
export OCI_TENANCY="ocid1.tenancy.oc1.."
export OCI_COMPARTMENT_ID="ocid1.compartment.oc1.."
# Provide either the private key content OR the path to the key file:
export OCI_KEY_FILE="/path/to/oci_api_key.pem"
# export OCI_KEY="-----BEGIN PRIVATE KEY-----\n..."
```

### Method 2: OCI SDK Signer
Use an OCI SDK `Signer` object for authentication. This method:
- Leverages the official [OCI SDK for signing](https://docs.oracle.com/en-us/iaas/tools/python/latest/api/signing.html)
Expand Down Expand Up @@ -220,6 +241,92 @@ print(response)
</TabItem>
</Tabs>

## LiteLLM Proxy Usage

Here's how to call OCI GenAI through the LiteLLM Proxy Server.

### 1. Setup config.yaml

```yaml
model_list:
- model_name: oci-grok-4
litellm_params:
model: oci/xai.grok-4
oci_region: os.environ/OCI_REGION
oci_user: os.environ/OCI_USER
oci_fingerprint: os.environ/OCI_FINGERPRINT
oci_tenancy: os.environ/OCI_TENANCY
oci_key_file: os.environ/OCI_KEY_FILE
oci_compartment_id: os.environ/OCI_COMPARTMENT_ID

- model_name: oci-cohere-command
litellm_params:
model: oci/cohere.command-latest
oci_region: os.environ/OCI_REGION
oci_user: os.environ/OCI_USER
oci_fingerprint: os.environ/OCI_FINGERPRINT
oci_tenancy: os.environ/OCI_TENANCY
oci_key_file: os.environ/OCI_KEY_FILE
oci_compartment_id: os.environ/OCI_COMPARTMENT_ID
```

All possible auth params:

```
oci_region: Optional[str],
oci_user: Optional[str],
oci_fingerprint: Optional[str],
oci_tenancy: Optional[str],
oci_key: Optional[str], # private key content as string
oci_key_file: Optional[str], # path to .pem file
oci_compartment_id: Optional[str],
oci_serving_mode: Optional[str], # "ON_DEMAND" (default) or "DEDICATED"
oci_endpoint_id: Optional[str], # only used with DEDICATED
```

### 2. Start the proxy

```bash
litellm --config /path/to/config.yaml
```

### 3. Test it

<Tabs>
<TabItem value="Curl" label="Curl Request">

```shell
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "oci-grok-4",
"messages": [
{"role": "user", "content": "what llm are you"}
]
}'
```

</TabItem>
<TabItem value="openai" label="OpenAI v1.0.0+">

```python
import openai

client = openai.OpenAI(
api_key="anything",
base_url="http://0.0.0.0:4000"
)

response = client.chat.completions.create(
model="oci-grok-4",
messages=[{"role": "user", "content": "write a short poem"}],
)
print(response)
```

</TabItem>
</Tabs>

## Usage - Streaming
Just set `stream=True` when calling completion.

Expand Down Expand Up @@ -411,20 +518,202 @@ response = completion(
)
```

## Usage - Function Calling / Tool Calling

OCI GenAI supports OpenAI-compatible function calling. LiteLLM normalizes the request and response shape so the same code that targets OpenAI works with OCI Cohere and Generic (xAI Grok, Meta Llama, Google Gemini) models.

<Tabs>
<TabItem value="tool-sdk" label="SDK">

```python
from litellm import completion

tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
},
}
]

response = completion(
model="oci/xai.grok-4",
messages=[{"role": "user", "content": "What's the weather in Boston today?"}],
tools=tools,
tool_choice="auto",
oci_region="us-chicago-1",
oci_user="<your_oci_user>",
oci_fingerprint="<your_oci_fingerprint>",
oci_tenancy="<your_oci_tenancy>",
oci_key_file="<path/to/oci_key.pem>",
oci_compartment_id="<oci_compartment_id>",
)

# Inspect the tool call
print(response.choices[0].message.tool_calls)
```

</TabItem>
<TabItem value="tool-proxy" label="PROXY">

```python
import openai

client = openai.OpenAI(api_key="anything", base_url="http://0.0.0.0:4000")

response = client.chat.completions.create(
model="oci-grok-4",
messages=[{"role": "user", "content": "What's the weather in Boston today?"}],
tools=[
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
},
}
],
tool_choice="auto",
)
print(response.choices[0].message.tool_calls)
```

</TabItem>
</Tabs>

Tool calling works with both Cohere (`cohere.command-*`) and Generic (`xai.grok-*`, `meta.llama-*`, `google.gemini-*`) model families — LiteLLM adapts the OpenAI tool schema to each vendor's native format internally.

## Usage - Vision / Multimodal

OCI GenAI exposes vision-capable models that accept images alongside text. Pass images using the standard OpenAI `image_url` content block.

```python
from litellm import completion

response = completion(
model="oci/meta.llama-4-maverick-17b-128e-instruct-fp8",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
},
},
],
}
],
oci_region="us-chicago-1",
oci_user="<your_oci_user>",
oci_fingerprint="<your_oci_fingerprint>",
oci_tenancy="<your_oci_tenancy>",
oci_key_file="<path/to/oci_key.pem>",
oci_compartment_id="<oci_compartment_id>",
)
print(response.choices[0].message.content)
```

Vision-capable models on OCI include:

- `meta.llama-4-maverick-17b-128e-instruct-fp8`
- `meta.llama-4-scout-17b-16e-instruct`
- `meta.llama-3.2-11b-vision-instruct`
- `meta.llama-3.2-90b-vision-instruct`
- `cohere.command-a-vision-07-2025`
- `google.gemini-2.5-pro`, `google.gemini-2.5-flash`, `google.gemini-2.5-flash-lite`

Both URL and base64-encoded data URIs are supported.

## Usage - Reasoning / Thinking

OCI Generic-vendor models (xAI Grok reasoning variants, Google Gemini, etc.) support a reasoning step. LiteLLM exposes this via the OpenAI-compatible `reasoning_effort` parameter — accepted values are `"low"`, `"medium"`, `"high"`, and `"disable"` (mapped to OCI's `NONE`).

Returned reasoning tokens are surfaced on `usage.completion_tokens_details.reasoning_tokens`, matching the OpenAI shape.

<Tabs>
<TabItem value="reasoning-sdk" label="SDK">

```python
from litellm import completion

response = completion(
model="oci/xai.grok-3-mini",
messages=[{"role": "user", "content": "If 3x + 7 = 22, what is x? Show your reasoning."}],
reasoning_effort="high", # "low" | "medium" | "high" | "disable"
oci_region="us-chicago-1",
oci_user="<your_oci_user>",
oci_fingerprint="<your_oci_fingerprint>",
oci_tenancy="<your_oci_tenancy>",
oci_key_file="<path/to/oci_key.pem>",
oci_compartment_id="<oci_compartment_id>",
)

print(response.choices[0].message.content)
print("Reasoning tokens:", response.usage.completion_tokens_details.reasoning_tokens)
```

</TabItem>
<TabItem value="reasoning-proxy" label="PROXY">

```python
import openai

client = openai.OpenAI(api_key="anything", base_url="http://0.0.0.0:4000")

response = client.chat.completions.create(
model="oci-grok-mini",
messages=[{"role": "user", "content": "If 3x + 7 = 22, what is x?"}],
reasoning_effort="high",
)
print(response.choices[0].message.content)
```

</TabItem>
</Tabs>

:::note
`reasoning_effort` is only honored on Generic-vendor reasoning models (e.g., `xai.grok-3-mini`, `xai.grok-4`, `google.gemini-2.5-pro`). It is silently ignored for OCI Cohere models, which are not reasoning models.
:::

## Optional Parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `oci_region` | string | `us-ashburn-1` | OCI region where the GenAI service is deployed |
| `oci_serving_mode` | string | `ON_DEMAND` | Service mode: `ON_DEMAND` for managed models or `DEDICATED` for dedicated endpoints |
| `oci_endpoint_id` | string | Same as `model` | (For DEDICATED mode) The OCID of your dedicated endpoint |
| `oci_compartment_id` | string | **Required** | The OCID of the OCI compartment containing your resources |
| `oci_user` | string | - | (Manual auth) The OCID of the OCI user |
| `oci_fingerprint` | string | - | (Manual auth) The fingerprint of the API signing key |
| `oci_tenancy` | string | - | (Manual auth) The OCID of your OCI tenancy |
| `oci_key` | string | - | (Manual auth) The private key content as a string |
| `oci_key_file` | string | - | (Manual auth) Path to the private key file |
| `oci_signer` | object | - | (SDK auth) OCI SDK Signer object for authentication |
| Parameter | Type | Default | Environment Variable | Description |
|-----------|------|---------|----------------------|-------------|
| `oci_region` | string | `us-ashburn-1` | `OCI_REGION` | OCI region where the GenAI service is deployed |
| `oci_serving_mode` | string | `ON_DEMAND` | – | Service mode: `ON_DEMAND` for managed models or `DEDICATED` for dedicated endpoints |
| `oci_endpoint_id` | string | Same as `model` | – | (For DEDICATED mode) The OCID of your dedicated endpoint |
| `oci_compartment_id` | string | **Required** | `OCI_COMPARTMENT_ID` | The OCID of the OCI compartment containing your resources |
| `oci_user` | string | – | `OCI_USER` | (Manual auth) The OCID of the OCI user |
| `oci_fingerprint` | string | – | `OCI_FINGERPRINT` | (Manual auth) The fingerprint of the API signing key |
| `oci_tenancy` | string | – | `OCI_TENANCY` | (Manual auth) The OCID of your OCI tenancy |
| `oci_key` | string | – | `OCI_KEY` | (Manual auth) The private key content as a string |
| `oci_key_file` | string | – | `OCI_KEY_FILE` | (Manual auth) Path to the private key file |
| `oci_signer` | object | – | – | (SDK auth) OCI SDK Signer object for authentication |
| `reasoning_effort` | string | – | – | Reasoning level for Generic-vendor reasoning models: `low`, `medium`, `high`, `disable` |

## Embeddings

Expand Down