diff --git a/docs/providers/oci.md b/docs/providers/oci.md index 182bb440..5e0277a0 100644 --- a/docs/providers/oci.md +++ b/docs/providers/oci.md @@ -8,19 +8,20 @@ Check the [OCI Models List](https://docs.oracle.com/en-us/iaas/Content/generativ ## Supported Models +For model lifecycle, retirement dates, and recommended replacements, see [OCI's on-demand model retirement page](https://docs.oracle.com/en-us/iaas/Content/generative-ai/deprecating-on-demand.htm) — Oracle is the authoritative source. + ### Chat / Text Generation #### Meta Llama Models -- `meta.llama-4-maverick-17b-128e-instruct-fp8` -- `meta.llama-4-scout-17b-16e-instruct` +- `meta.llama-4-maverick-17b-128e-instruct-fp8` (multimodal) +- `meta.llama-4-scout-17b-16e-instruct` (multimodal) - `meta.llama-3.3-70b-instruct` - `meta.llama-3.3-70b-instruct-fp8-dynamic` -- `meta.llama-3.2-90b-vision-instruct` -- `meta.llama-3.2-11b-vision-instruct` -- `meta.llama-3.1-405b-instruct` -- `meta.llama-3.1-70b-instruct` +- `meta.llama-3.2-90b-vision-instruct` (multimodal) +- `meta.llama-3.2-11b-vision-instruct` (multimodal) #### xAI Grok Models +- `xai.grok-4.3` - `xai.grok-4.20` - `xai.grok-4.20-multi-agent` - `xai.grok-4` @@ -36,26 +37,31 @@ Check the [OCI Models List](https://docs.oracle.com/en-us/iaas/Content/generativ - `cohere.command-latest` - `cohere.command-a-03-2025` - `cohere.command-a-reasoning-08-2025` -- `cohere.command-a-vision-07-2025` +- `cohere.command-a-vision-07-2025` (multimodal) - `cohere.command-a-translate-08-2025` - `cohere.command-plus-latest` -- `cohere.command-r-08-2024` - `cohere.command-r-plus-08-2024` +- `cohere.command-r-08-2024` #### Google Gemini Models (via OCI) -- `google.gemini-2.5-pro` -- `google.gemini-2.5-flash` -- `google.gemini-2.5-flash-lite` +- `google.gemini-2.5-pro` (multimodal) +- `google.gemini-2.5-flash` (multimodal) +- `google.gemini-2.5-flash-lite` (multimodal) + +#### OpenAI Open-Source Models (via OCI) +- `openai.gpt-oss-120b` +- `openai.gpt-oss-20b` ### Embedding Models +- `cohere.embed-v4.0` (1536 dimensions, multimodal) - `cohere.embed-english-v3.0` (1024 dimensions) - `cohere.embed-english-light-v3.0` (384 dimensions) - `cohere.embed-multilingual-v3.0` (1024 dimensions) - `cohere.embed-multilingual-light-v3.0` (384 dimensions) - `cohere.embed-english-image-v3.0` (1024 dimensions, multimodal) - `cohere.embed-english-light-image-v3.0` (384 dimensions, multimodal) +- `cohere.embed-multilingual-image-v3.0` (1024 dimensions, multimodal) - `cohere.embed-multilingual-light-image-v3.0` (384 dimensions, multimodal) -- `cohere.embed-v4.0` (1536 dimensions, multimodal) ## Authentication @@ -73,6 +79,21 @@ Provide individual OCI credentials directly to LiteLLM. Follow the [official Ora This is the default method for LiteLLM AI Gateway (LLM Proxy) access to OCI GenAI models. +**Environment Variables** + +Instead of passing credentials in code, you can set the following environment variables — LiteLLM will read them automatically: + +```bash +export OCI_REGION="us-chicago-1" +export OCI_USER="ocid1.user.oc1.." +export OCI_FINGERPRINT="xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx" +export OCI_TENANCY="ocid1.tenancy.oc1.." +export OCI_COMPARTMENT_ID="ocid1.compartment.oc1.." +# Provide either the private key content OR the path to the key file: +export OCI_KEY_FILE="/path/to/oci_api_key.pem" +# export OCI_KEY="-----BEGIN PRIVATE KEY-----\n..." +``` + ### Method 2: OCI SDK Signer Use an OCI SDK `Signer` object for authentication. This method: - Leverages the official [OCI SDK for signing](https://docs.oracle.com/en-us/iaas/tools/python/latest/api/signing.html) @@ -220,6 +241,92 @@ print(response) +## LiteLLM Proxy Usage + +Here's how to call OCI GenAI through the LiteLLM Proxy Server. + +### 1. Setup config.yaml + +```yaml +model_list: + - model_name: oci-grok-4 + litellm_params: + model: oci/xai.grok-4 + oci_region: os.environ/OCI_REGION + oci_user: os.environ/OCI_USER + oci_fingerprint: os.environ/OCI_FINGERPRINT + oci_tenancy: os.environ/OCI_TENANCY + oci_key_file: os.environ/OCI_KEY_FILE + oci_compartment_id: os.environ/OCI_COMPARTMENT_ID + + - model_name: oci-cohere-command + litellm_params: + model: oci/cohere.command-latest + oci_region: os.environ/OCI_REGION + oci_user: os.environ/OCI_USER + oci_fingerprint: os.environ/OCI_FINGERPRINT + oci_tenancy: os.environ/OCI_TENANCY + oci_key_file: os.environ/OCI_KEY_FILE + oci_compartment_id: os.environ/OCI_COMPARTMENT_ID +``` + +All possible auth params: + +``` +oci_region: Optional[str], +oci_user: Optional[str], +oci_fingerprint: Optional[str], +oci_tenancy: Optional[str], +oci_key: Optional[str], # private key content as string +oci_key_file: Optional[str], # path to .pem file +oci_compartment_id: Optional[str], +oci_serving_mode: Optional[str], # "ON_DEMAND" (default) or "DEDICATED" +oci_endpoint_id: Optional[str], # only used with DEDICATED +``` + +### 2. Start the proxy + +```bash +litellm --config /path/to/config.yaml +``` + +### 3. Test it + + + + +```shell +curl --location 'http://0.0.0.0:4000/chat/completions' \ +--header 'Content-Type: application/json' \ +--data '{ + "model": "oci-grok-4", + "messages": [ + {"role": "user", "content": "what llm are you"} + ] +}' +``` + + + + +```python +import openai + +client = openai.OpenAI( + api_key="anything", + base_url="http://0.0.0.0:4000" +) + +response = client.chat.completions.create( + model="oci-grok-4", + messages=[{"role": "user", "content": "write a short poem"}], +) +print(response) +``` + + + + ## Usage - Streaming Just set `stream=True` when calling completion. @@ -411,20 +518,202 @@ response = completion( ) ``` +## Usage - Function Calling / Tool Calling + +OCI GenAI supports OpenAI-compatible function calling. LiteLLM normalizes the request and response shape so the same code that targets OpenAI works with OCI Cohere and Generic (xAI Grok, Meta Llama, Google Gemini) models. + + + + +```python +from litellm import completion + +tools = [ + { + "type": "function", + "function": { + "name": "get_current_weather", + "description": "Get the current weather in a given location", + "parameters": { + "type": "object", + "properties": { + "location": { + "type": "string", + "description": "The city and state, e.g. San Francisco, CA", + }, + "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}, + }, + "required": ["location"], + }, + }, + } +] + +response = completion( + model="oci/xai.grok-4", + messages=[{"role": "user", "content": "What's the weather in Boston today?"}], + tools=tools, + tool_choice="auto", + oci_region="us-chicago-1", + oci_user="", + oci_fingerprint="", + oci_tenancy="", + oci_key_file="", + oci_compartment_id="", +) + +# Inspect the tool call +print(response.choices[0].message.tool_calls) +``` + + + + +```python +import openai + +client = openai.OpenAI(api_key="anything", base_url="http://0.0.0.0:4000") + +response = client.chat.completions.create( + model="oci-grok-4", + messages=[{"role": "user", "content": "What's the weather in Boston today?"}], + tools=[ + { + "type": "function", + "function": { + "name": "get_current_weather", + "description": "Get the current weather in a given location", + "parameters": { + "type": "object", + "properties": { + "location": {"type": "string"}, + "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}, + }, + "required": ["location"], + }, + }, + } + ], + tool_choice="auto", +) +print(response.choices[0].message.tool_calls) +``` + + + + +Tool calling works with both Cohere (`cohere.command-*`) and Generic (`xai.grok-*`, `meta.llama-*`, `google.gemini-*`) model families — LiteLLM adapts the OpenAI tool schema to each vendor's native format internally. + +## Usage - Vision / Multimodal + +OCI GenAI exposes vision-capable models that accept images alongside text. Pass images using the standard OpenAI `image_url` content block. + +```python +from litellm import completion + +response = completion( + model="oci/meta.llama-4-maverick-17b-128e-instruct-fp8", + messages=[ + { + "role": "user", + "content": [ + {"type": "text", "text": "What is in this image?"}, + { + "type": "image_url", + "image_url": { + "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" + }, + }, + ], + } + ], + oci_region="us-chicago-1", + oci_user="", + oci_fingerprint="", + oci_tenancy="", + oci_key_file="", + oci_compartment_id="", +) +print(response.choices[0].message.content) +``` + +Vision-capable models on OCI include: + +- `meta.llama-4-maverick-17b-128e-instruct-fp8` +- `meta.llama-4-scout-17b-16e-instruct` +- `meta.llama-3.2-11b-vision-instruct` +- `meta.llama-3.2-90b-vision-instruct` +- `cohere.command-a-vision-07-2025` +- `google.gemini-2.5-pro`, `google.gemini-2.5-flash`, `google.gemini-2.5-flash-lite` + +Both URL and base64-encoded data URIs are supported. + +## Usage - Reasoning / Thinking + +OCI Generic-vendor models (xAI Grok reasoning variants, Google Gemini, etc.) support a reasoning step. LiteLLM exposes this via the OpenAI-compatible `reasoning_effort` parameter — accepted values are `"low"`, `"medium"`, `"high"`, and `"disable"` (mapped to OCI's `NONE`). + +Returned reasoning tokens are surfaced on `usage.completion_tokens_details.reasoning_tokens`, matching the OpenAI shape. + + + + +```python +from litellm import completion + +response = completion( + model="oci/xai.grok-3-mini", + messages=[{"role": "user", "content": "If 3x + 7 = 22, what is x? Show your reasoning."}], + reasoning_effort="high", # "low" | "medium" | "high" | "disable" + oci_region="us-chicago-1", + oci_user="", + oci_fingerprint="", + oci_tenancy="", + oci_key_file="", + oci_compartment_id="", +) + +print(response.choices[0].message.content) +print("Reasoning tokens:", response.usage.completion_tokens_details.reasoning_tokens) +``` + + + + +```python +import openai + +client = openai.OpenAI(api_key="anything", base_url="http://0.0.0.0:4000") + +response = client.chat.completions.create( + model="oci-grok-mini", + messages=[{"role": "user", "content": "If 3x + 7 = 22, what is x?"}], + reasoning_effort="high", +) +print(response.choices[0].message.content) +``` + + + + +:::note +`reasoning_effort` is only honored on Generic-vendor reasoning models (e.g., `xai.grok-3-mini`, `xai.grok-4`, `google.gemini-2.5-pro`). It is silently ignored for OCI Cohere models, which are not reasoning models. +::: + ## Optional Parameters -| Parameter | Type | Default | Description | -|-----------|------|---------|-------------| -| `oci_region` | string | `us-ashburn-1` | OCI region where the GenAI service is deployed | -| `oci_serving_mode` | string | `ON_DEMAND` | Service mode: `ON_DEMAND` for managed models or `DEDICATED` for dedicated endpoints | -| `oci_endpoint_id` | string | Same as `model` | (For DEDICATED mode) The OCID of your dedicated endpoint | -| `oci_compartment_id` | string | **Required** | The OCID of the OCI compartment containing your resources | -| `oci_user` | string | - | (Manual auth) The OCID of the OCI user | -| `oci_fingerprint` | string | - | (Manual auth) The fingerprint of the API signing key | -| `oci_tenancy` | string | - | (Manual auth) The OCID of your OCI tenancy | -| `oci_key` | string | - | (Manual auth) The private key content as a string | -| `oci_key_file` | string | - | (Manual auth) Path to the private key file | -| `oci_signer` | object | - | (SDK auth) OCI SDK Signer object for authentication | +| Parameter | Type | Default | Environment Variable | Description | +|-----------|------|---------|----------------------|-------------| +| `oci_region` | string | `us-ashburn-1` | `OCI_REGION` | OCI region where the GenAI service is deployed | +| `oci_serving_mode` | string | `ON_DEMAND` | – | Service mode: `ON_DEMAND` for managed models or `DEDICATED` for dedicated endpoints | +| `oci_endpoint_id` | string | Same as `model` | – | (For DEDICATED mode) The OCID of your dedicated endpoint | +| `oci_compartment_id` | string | **Required** | `OCI_COMPARTMENT_ID` | The OCID of the OCI compartment containing your resources | +| `oci_user` | string | – | `OCI_USER` | (Manual auth) The OCID of the OCI user | +| `oci_fingerprint` | string | – | `OCI_FINGERPRINT` | (Manual auth) The fingerprint of the API signing key | +| `oci_tenancy` | string | – | `OCI_TENANCY` | (Manual auth) The OCID of your OCI tenancy | +| `oci_key` | string | – | `OCI_KEY` | (Manual auth) The private key content as a string | +| `oci_key_file` | string | – | `OCI_KEY_FILE` | (Manual auth) Path to the private key file | +| `oci_signer` | object | – | – | (SDK auth) OCI SDK Signer object for authentication | +| `reasoning_effort` | string | – | – | Reasoning level for Generic-vendor reasoning models: `low`, `medium`, `high`, `disable` | ## Embeddings