Oh My OnPremCode

On-premise OpenCode plugin with local inference support for air-gapped environments.

Overview

oh-my-onpremcode is a fork of oh-my-opencode designed for enterprise on-premise deployments. It enables AI-assisted development in air-gapped environments without internet access.

Key Features

Local Inference: Connect to vLLM instances running in your Kubernetes cluster
Ollama Support: Use local models via Ollama as a fallback
Air-Gapped Ready: All internet-dependent features auto-disable in on-prem mode
Model Mapping: Map cloud model names to your local model endpoints
Custom Branding: Display your company logo and welcome message
K8s Service Discovery: Auto-discover vLLM services in your cluster

Quick Start

Installation

# Add to your OpenCode config (~/.config/opencode/opencode.json)
{
  "plugin": ["oh-my-onpremcode"]
}

Basic On-Prem Configuration

Create ~/.config/opencode/oh-my-onpremcode.json:

{
  "$schema": "https://raw.githubusercontent.com/sionic-ai/oh-my-onpremcode/master/assets/oh-my-onpremcode.schema.json",
  "onprem": {
    "mode": "onprem",
    "vllm": {
      "endpoints": [
        {
          "url": "http://vllm-service.default.svc.cluster.local:8000/v1",
          "model": "meta-llama/Llama-3.1-70B-Instruct"
        }
      ]
    },
    "model_mapping": {
      "anthropic/claude-opus-4-5": "meta-llama/Llama-3.1-70B-Instruct",
      "anthropic/claude-sonnet-4-5": "meta-llama/Llama-3.1-70B-Instruct",
      "openai/gpt-5.2": "meta-llama/Llama-3.1-70B-Instruct"
    }
  }
}

Configuration

Deployment Modes

Mode	Description
`cloud`	Default mode. Uses cloud APIs (Anthropic, OpenAI, Google)
`onprem`	On-premise mode. Uses local inference, disables internet features

vLLM Configuration

Connect to vLLM instances serving OpenAI-compatible API:

{
  "onprem": {
    "mode": "onprem",
    "vllm": {
      "endpoints": [
        {
          "url": "http://vllm-service:8000/v1",
          "model": "meta-llama/Llama-3.1-70B-Instruct",
          "api_key": "optional-api-key",
          "max_tokens": 4096,
          "temperature": 0.7
        }
      ],
      "k8s_namespace": "ml-inference",
      "k8s_service_name": "vllm-service",
      "k8s_port": 8000,
      "auto_discover": true
    }
  }
}

vLLM Endpoint Options

Option	Type	Description
`url`	string	Full URL to vLLM OpenAI-compatible endpoint
`model`	string	Model name to use (or "auto" to use request model)
`api_key`	string	Optional API key for authentication
`max_tokens`	number	Maximum tokens for completion
`temperature`	number	Sampling temperature (0-2)

Kubernetes Auto-Discovery

When auto_discover is enabled, the plugin automatically constructs the vLLM endpoint URL using K8s service DNS:

http://{k8s_service_name}.{k8s_namespace}.svc.cluster.local:{k8s_port}/v1

Ollama Configuration

Use Ollama as a secondary local inference option:

{
  "onprem": {
    "mode": "onprem",
    "ollama": {
      "enabled": true,
      "host": "localhost",
      "port": 11434,
      "models": ["llama3.1:70b", "codellama:34b"]
    }
  }
}

Option	Type	Default	Description
`enabled`	boolean	false	Enable Ollama integration
`host`	string	"localhost"	Ollama server hostname
`port`	number	11434	Ollama server port
`models`	string[]	[]	List of available models

Model Mapping

Map cloud model identifiers to your local models:

{
  "onprem": {
    "model_mapping": {
      "anthropic/claude-opus-4-5": "meta-llama/Llama-3.1-70B-Instruct",
      "anthropic/claude-sonnet-4-5": "meta-llama/Llama-3.1-8B-Instruct",
      "openai/gpt-5.2": "Qwen/Qwen2.5-72B-Instruct",
      "google/gemini-3-pro-high": "meta-llama/Llama-3.1-70B-Instruct"
    }
  }
}

This allows existing agent configurations to work seamlessly with your local models.

Custom Branding

Display your company branding in the terminal:

{
  "onprem": {
    "branding": {
      "company_name": "Sionic AI",
      "welcome_message": "Welcome to Sionic AI Development Environment",
      "primary_color": "#0066CC",
      "logo_ascii": "  _____ _             _        \n / ____(_)           (_)       \n| (___  _  ___  _ __  _  ___   \n \\___ \\| |/ _ \\| '_ \\| |/ __|  \n ____) | | (_) | | | | | (__   \n|_____/|_|\\___/|_| |_|_|\\___|  "
    }
  }
}

Option	Description
`company_name`	Displayed in startup toast
`welcome_message`	Custom welcome message
`primary_color`	Hex color for UI elements
`logo_ascii`	ASCII art logo for terminal
`logo_path`	Path to logo file

On-Prem Mode Behavior

When mode is set to "onprem", the following features are automatically disabled:

Feature	Status	Reason
context7 MCP	Disabled	Requires internet
websearch_exa MCP	Disabled	Requires internet
grep_app MCP	Disabled	Requires internet
Google OAuth	Disabled	Requires internet
Auto-update checker	Limited	Requires npm registry
Claude Code MCP loader	Disabled	External MCPs may require internet

All local features continue to work normally:

LSP tools (hover, goto definition, references, rename, etc.)
AST-grep search and replace
File operations (grep, glob)
Background agents
Todo management
Session recovery

Agent Configuration

Override agent models to use your local endpoints:

{
  "agents": {
    "Sisyphus": {
      "model": "vllm/meta-llama/Llama-3.1-70B-Instruct"
    },
    "oracle": {
      "model": "vllm/Qwen/Qwen2.5-72B-Instruct"
    },
    "librarian": {
      "model": "vllm/meta-llama/Llama-3.1-70B-Instruct"
    },
    "explore": {
      "model": "ollama/llama3.1:8b"
    },
    "frontend-ui-ux-engineer": {
      "model": "vllm/meta-llama/Llama-3.1-70B-Instruct"
    },
    "document-writer": {
      "model": "vllm/meta-llama/Llama-3.1-8B-Instruct"
    },
    "multimodal-looker": {
      "disable": true
    }
  }
}

Available Agents

Agent	Purpose	Recommended Model Size
Sisyphus	Primary orchestrator	70B+
oracle	Architecture & debugging	70B+
librarian	Documentation & research	70B+
explore	Fast codebase exploration	8B-34B
frontend-ui-ux-engineer	UI generation	70B+
document-writer	Technical writing	8B-34B
multimodal-looker	Image/PDF analysis	Disable if no multimodal model

Full Configuration Reference

{
  "$schema": "https://raw.githubusercontent.com/sionic-ai/oh-my-onpremcode/master/assets/oh-my-onpremcode.schema.json",
  
  "onprem": {
    "mode": "onprem",
    
    "vllm": {
      "endpoints": [
        {
          "url": "http://vllm-primary:8000/v1",
          "model": "meta-llama/Llama-3.1-70B-Instruct"
        },
        {
          "url": "http://vllm-secondary:8000/v1",
          "model": "Qwen/Qwen2.5-72B-Instruct"
        }
      ],
      "k8s_namespace": "ml-inference",
      "k8s_service_name": "vllm-gateway",
      "k8s_port": 8000,
      "auto_discover": true,
      "default_endpoint": "http://vllm-primary:8000/v1"
    },
    
    "ollama": {
      "enabled": true,
      "host": "ollama-service",
      "port": 11434,
      "models": ["llama3.1:70b", "codellama:34b", "qwen2.5:32b"]
    },
    
    "model_mapping": {
      "anthropic/claude-opus-4-5": "meta-llama/Llama-3.1-70B-Instruct",
      "anthropic/claude-sonnet-4-5": "meta-llama/Llama-3.1-8B-Instruct",
      "openai/gpt-5.2": "Qwen/Qwen2.5-72B-Instruct",
      "google/gemini-3-pro-high": "meta-llama/Llama-3.1-70B-Instruct",
      "google/gemini-3-flash": "meta-llama/Llama-3.1-8B-Instruct"
    },
    
    "branding": {
      "company_name": "Your Company",
      "welcome_message": "AI-Powered Development Environment",
      "primary_color": "#0066CC"
    },
    
    "disable_internet_features": true
  },
  
  "agents": {
    "Sisyphus": { "model": "vllm/meta-llama/Llama-3.1-70B-Instruct" },
    "oracle": { "model": "vllm/Qwen/Qwen2.5-72B-Instruct" },
    "librarian": { "model": "vllm/meta-llama/Llama-3.1-70B-Instruct" },
    "explore": { "model": "ollama/llama3.1:8b" },
    "frontend-ui-ux-engineer": { "model": "vllm/meta-llama/Llama-3.1-70B-Instruct" },
    "document-writer": { "model": "vllm/meta-llama/Llama-3.1-8B-Instruct" },
    "multimodal-looker": { "disable": true }
  },
  
  "disabled_hooks": [],
  "disabled_agents": [],
  "sisyphus_agent": { "disabled": false }
}

Deployment Guide

Prerequisites

OpenCode >= 1.0.150 installed
vLLM or Ollama running in your environment
Network access from OpenCode to inference endpoints

Kubernetes Deployment

Example vLLM deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vllm-llama
  namespace: ml-inference
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vllm-llama
  template:
    metadata:
      labels:
        app: vllm-llama
    spec:
      containers:
      - name: vllm
        image: vllm/vllm-openai:latest
        args:
        - --model=meta-llama/Llama-3.1-70B-Instruct
        - --tensor-parallel-size=4
        - --max-model-len=32768
        ports:
        - containerPort: 8000
        resources:
          limits:
            nvidia.com/gpu: 4
        volumeMounts:
        - name: model-cache
          mountPath: /root/.cache/huggingface
      volumes:
      - name: model-cache
        persistentVolumeClaim:
          claimName: model-cache-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: vllm-service
  namespace: ml-inference
spec:
  selector:
    app: vllm-llama
  ports:
  - port: 8000
    targetPort: 8000

Docker Compose (Development)

version: '3.8'
services:
  vllm:
    image: vllm/vllm-openai:latest
    command: --model meta-llama/Llama-3.1-8B-Instruct --max-model-len 8192
    ports:
      - "8000:8000"
    volumes:
      - ~/.cache/huggingface:/root/.cache/huggingface
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama-data:/root/.ollama

volumes:
  ollama-data:

Troubleshooting

vLLM Connection Issues

# Test vLLM endpoint
curl http://vllm-service:8000/v1/models

# Test chat completion
curl http://vllm-service:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-70B-Instruct",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

# Check K8s service
kubectl get svc -n ml-inference
kubectl describe svc vllm-service -n ml-inference

# Check pod logs
kubectl logs -n ml-inference deployment/vllm-llama

Ollama Connection Issues

# Test Ollama
curl http://localhost:11434/api/tags

# List available models
ollama list

# Pull a model
ollama pull llama3.1:70b

# Test chat
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.1:70b",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": false
}'

Enable Debug Logging

# Enable debug mode
export COMMENT_CHECKER_DEBUG=1

# Check logs
tail -f /tmp/oh-my-onpremcode.log

Common Issues

Issue	Solution
"No vLLM endpoint configured"	Add at least one endpoint to `vllm.endpoints`
Connection timeout	Check network access to inference service
Model not found	Verify model name matches vLLM/Ollama model
Out of memory	Reduce `max_tokens` or use smaller model

Migration from oh-my-opencode

If migrating from the original oh-my-opencode:

Rename config file:

mv ~/.config/opencode/oh-my-opencode.json ~/.config/opencode/oh-my-onpremcode.json

Update plugin reference in opencode.json:
```
{
  "plugin": ["oh-my-onpremcode"]
}
```

Add onprem configuration:

{
  "onprem": {
    "mode": "onprem",
    "vllm": { ... }
  }
}

Update agent model references if using cloud models

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Acknowledgments

Based on oh-my-opencode by @code-yeongyu.

Maintained by Sionic AI

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github		.github
.opencode		.opencode
assets		assets
script		script
src		src
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
bun.lockb		bun.lockb
package.json		package.json
tsconfig.json		tsconfig.json

License

sionic-ai/oh-my-onpremcode

Folders and files

Latest commit

History

Repository files navigation

Oh My OnPremCode

Overview

Key Features

Table of Contents

Quick Start

Installation

Basic On-Prem Configuration

Configuration

Deployment Modes

vLLM Configuration

vLLM Endpoint Options

Kubernetes Auto-Discovery

Ollama Configuration

Model Mapping

Custom Branding

On-Prem Mode Behavior

Agent Configuration

Available Agents

Full Configuration Reference

Deployment Guide

Prerequisites

Kubernetes Deployment

Docker Compose (Development)

Troubleshooting

vLLM Connection Issues

Ollama Connection Issues

Enable Debug Logging

Common Issues

Migration from oh-my-opencode

Contributing

License

Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages