The high-performance, headless MCP Gateway powered by Gemma 4 31B — bridging AI agents to the world's most powerful LLMs.
OmniLLM is a specialized Model Context Protocol (MCP) server designed for autonomous AI agents. It provides a unified, high-speed interface for Gemma 4, Claude 3.5 Sonnet, and GPT-4o.
By utilizing a headless architecture, OmniLLM achieves near-zero latency by removing UI-related overhead. It operates exclusively as a tool provider, ensuring maximum system resources are dedicated entirely to token throughput and intelligent routing.
OmniLLM is optimized for the Gemma 4 family. It currently targets gemma-4-31b-it as its primary engine for the Google Gemini provider, offering state-of-the-art reasoning and instruction-following capabilities directly via MCP tools.
We have designed OmniLLM so that you do not need to use any third-party package managers (like Homebrew or Chocolatey) or container tools (like Docker). You can install everything directly natively on your system.
- Node.js (v18+):
- Go to the official Node.js website: nodejs.org
- Download the macOS Installer (.pkg) for the LTS (Long Term Support) version.
- Run the installer and follow the standard installation prompts.
- Git:
- Open your
Terminalapp. - Type
git --versionand press Enter. If Git is not installed, your Mac will automatically prompt you to install the Xcode Command Line Tools (which includes Git). Click "Install".
- Open your
- Node.js (v18+):
- Go to the official Node.js website: nodejs.org
- Download the Windows Installer (.msi) for the LTS version.
- Run the installer. Ensure that the option to add Node.js to your system
PATHis checked (it usually is by default).
- Git:
- Go to the official Git website: git-scm.com/download/win
- Download the Standalone Installer (64-bit).
- Run the installer and click "Next" through the standard default settings.
Once Git and Node.js are installed, open your terminal (Terminal on Mac, Command Prompt or PowerShell on Windows) and run the following commands exactly as shown:
# 1. Clone the repository directly to your machine
git clone https://github.com/ManiDeep1822/OmniLLM.git
# 2. Navigate into the project folder
cd OmniLLM
# 3. Install the project dependencies natively via npm
npm installYou need to provide your API keys to the server. We use a .env file to store these securely on your local machine.
cp .env.example .envcopy .env.example .envOpen the newly created .env file in any standard text editor (like Notepad on Windows or TextEdit on Mac) and add your API keys:
GEMINI_API_KEY=your_key_here(Required for Gemma 4)CLAUDE_API_KEY=your_key_hereOPENAI_API_KEY=your_key_here
OmniLLM uses a lightweight local SQLite database to persist context and logs. Initialize it and start the server:
# Generate the database client
npx prisma generate
# Create the initial database tables
npx prisma migrate dev --name init
# Start the server in development/watch mode
npm run dev(Note: The server will listen for MCP JSON-RPC commands over standard input/output. It is completely normal if it looks like it is "hanging" in your terminal; it is waiting for an AI agent to communicate with it!)
If you are running the gateway in a production environment, you should compile the TypeScript code to plain JavaScript for maximum performance:
npm run build
npm startTo use OmniLLM in your favorite agentic environment (like Claude Desktop or Antigravity), you simply need to point it to the built server file.
Add the following to your MCP configuration file (mcp_config.json):
Mac Path: /Users/YOUR_USER_NAME/path/to/OmniLLM/dist/server.js
Windows Path: C:/Users/YOUR_USER_NAME/path/to/OmniLLM/dist/server.js
{
"mcpServers": {
"llm-gateway": {
"command": "node",
"args": ["/ABSOLUTE/PATH/TO/OmniLLM/dist/server.js"],
"env": {
"GEMINI_API_KEY": "YOUR_GEMINI_KEY",
"CLAUDE_API_KEY": "YOUR_CLAUDE_KEY",
"OPENAI_API_KEY": "YOUR_OPENAI_KEY",
"DATABASE_URL": "file:./dev.db"
}
}
}
}
> [!TIP]
> **API Keys**: You only need to include the API key(s) for the specific model(s) you plan to use! For example, if you are only using Gemma 4, you can safely remove the `CLAUDE_API_KEY` and `OPENAI_API_KEY` fields from your config.Once connected, your AI agents will automatically have access to the following OmniLLM capabilities:
| Tool | Capability |
|---|---|
stream-generate |
Real-time streaming output from Gemma 4, Claude, or OpenAI directly to the user. |
auto-router |
Automatically routes tasks to the most efficient model based on task complexity. |
multi-step-chain |
Executes complex, sequential reasoning prompts where each output informs the next step. |
model-comparison |
Runs a single prompt across all configured providers simultaneously to compare answer quality. |
context-chain |
Persistent conversation memory system backed by the local SQLite database. |
OmniLLM uses a dual-interface approach:
- Stdio (Primary): High-speed binary/text channel for MCP tool communication.
- HTTP/JSON (Secondary): Lightweight health and configuration API operating on port 4324.
Health Check endpoint: http://localhost:4324/api/health
Database Logs: As a headless gateway, OmniLLM logs all complex activity to the SQLite database. To view it visually, run:
npx prisma studioThis project is licensed under the MIT License.