Skip to content

sionic-ai/oh-my-onpremcode

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OH MY ONPREM CODE - Steroids for your onprem. Ship faster.

Oh My OnPremCode

On-premise OpenCode plugin with local inference support for air-gapped environments.

GitHub Release License


Overview

oh-my-onpremcode is a fork of oh-my-opencode designed for enterprise on-premise deployments. It enables AI-assisted development in air-gapped environments without internet access.

Key Features

  • Local Inference: Connect to vLLM instances running in your Kubernetes cluster
  • Ollama Support: Use local models via Ollama as a fallback
  • Air-Gapped Ready: All internet-dependent features auto-disable in on-prem mode
  • Model Mapping: Map cloud model names to your local model endpoints
  • Custom Branding: Display your company logo and welcome message
  • K8s Service Discovery: Auto-discover vLLM services in your cluster

Table of Contents


Quick Start

Installation

# Add to your OpenCode config (~/.config/opencode/opencode.json)
{
  "plugin": ["oh-my-onpremcode"]
}

Basic On-Prem Configuration

Create ~/.config/opencode/oh-my-onpremcode.json:

{
  "$schema": "https://raw.githubusercontent.com/sionic-ai/oh-my-onpremcode/master/assets/oh-my-onpremcode.schema.json",
  "onprem": {
    "mode": "onprem",
    "vllm": {
      "endpoints": [
        {
          "url": "http://vllm-service.default.svc.cluster.local:8000/v1",
          "model": "meta-llama/Llama-3.1-70B-Instruct"
        }
      ]
    },
    "model_mapping": {
      "anthropic/claude-opus-4-5": "meta-llama/Llama-3.1-70B-Instruct",
      "anthropic/claude-sonnet-4-5": "meta-llama/Llama-3.1-70B-Instruct",
      "openai/gpt-5.2": "meta-llama/Llama-3.1-70B-Instruct"
    }
  }
}

Configuration

Deployment Modes

Mode Description
cloud Default mode. Uses cloud APIs (Anthropic, OpenAI, Google)
onprem On-premise mode. Uses local inference, disables internet features

vLLM Configuration

Connect to vLLM instances serving OpenAI-compatible API:

{
  "onprem": {
    "mode": "onprem",
    "vllm": {
      "endpoints": [
        {
          "url": "http://vllm-service:8000/v1",
          "model": "meta-llama/Llama-3.1-70B-Instruct",
          "api_key": "optional-api-key",
          "max_tokens": 4096,
          "temperature": 0.7
        }
      ],
      "k8s_namespace": "ml-inference",
      "k8s_service_name": "vllm-service",
      "k8s_port": 8000,
      "auto_discover": true
    }
  }
}

vLLM Endpoint Options

Option Type Description
url string Full URL to vLLM OpenAI-compatible endpoint
model string Model name to use (or "auto" to use request model)
api_key string Optional API key for authentication
max_tokens number Maximum tokens for completion
temperature number Sampling temperature (0-2)

Kubernetes Auto-Discovery

When auto_discover is enabled, the plugin automatically constructs the vLLM endpoint URL using K8s service DNS:

http://{k8s_service_name}.{k8s_namespace}.svc.cluster.local:{k8s_port}/v1

Ollama Configuration

Use Ollama as a secondary local inference option:

{
  "onprem": {
    "mode": "onprem",
    "ollama": {
      "enabled": true,
      "host": "localhost",
      "port": 11434,
      "models": ["llama3.1:70b", "codellama:34b"]
    }
  }
}
Option Type Default Description
enabled boolean false Enable Ollama integration
host string "localhost" Ollama server hostname
port number 11434 Ollama server port
models string[] [] List of available models

Model Mapping

Map cloud model identifiers to your local models:

{
  "onprem": {
    "model_mapping": {
      "anthropic/claude-opus-4-5": "meta-llama/Llama-3.1-70B-Instruct",
      "anthropic/claude-sonnet-4-5": "meta-llama/Llama-3.1-8B-Instruct",
      "openai/gpt-5.2": "Qwen/Qwen2.5-72B-Instruct",
      "google/gemini-3-pro-high": "meta-llama/Llama-3.1-70B-Instruct"
    }
  }
}

This allows existing agent configurations to work seamlessly with your local models.

Custom Branding

Display your company branding in the terminal:

{
  "onprem": {
    "branding": {
      "company_name": "Sionic AI",
      "welcome_message": "Welcome to Sionic AI Development Environment",
      "primary_color": "#0066CC",
      "logo_ascii": "  _____ _             _        \n / ____(_)           (_)       \n| (___  _  ___  _ __  _  ___   \n \\___ \\| |/ _ \\| '_ \\| |/ __|  \n ____) | | (_) | | | | | (__   \n|_____/|_|\\___/|_| |_|_|\\___|  "
    }
  }
}
Option Description
company_name Displayed in startup toast
welcome_message Custom welcome message
primary_color Hex color for UI elements
logo_ascii ASCII art logo for terminal
logo_path Path to logo file

On-Prem Mode Behavior

When mode is set to "onprem", the following features are automatically disabled:

Feature Status Reason
context7 MCP Disabled Requires internet
websearch_exa MCP Disabled Requires internet
grep_app MCP Disabled Requires internet
Google OAuth Disabled Requires internet
Auto-update checker Limited Requires npm registry
Claude Code MCP loader Disabled External MCPs may require internet

All local features continue to work normally:

  • LSP tools (hover, goto definition, references, rename, etc.)
  • AST-grep search and replace
  • File operations (grep, glob)
  • Background agents
  • Todo management
  • Session recovery

Agent Configuration

Override agent models to use your local endpoints:

{
  "agents": {
    "Sisyphus": {
      "model": "vllm/meta-llama/Llama-3.1-70B-Instruct"
    },
    "oracle": {
      "model": "vllm/Qwen/Qwen2.5-72B-Instruct"
    },
    "librarian": {
      "model": "vllm/meta-llama/Llama-3.1-70B-Instruct"
    },
    "explore": {
      "model": "ollama/llama3.1:8b"
    },
    "frontend-ui-ux-engineer": {
      "model": "vllm/meta-llama/Llama-3.1-70B-Instruct"
    },
    "document-writer": {
      "model": "vllm/meta-llama/Llama-3.1-8B-Instruct"
    },
    "multimodal-looker": {
      "disable": true
    }
  }
}

Available Agents

Agent Purpose Recommended Model Size
Sisyphus Primary orchestrator 70B+
oracle Architecture & debugging 70B+
librarian Documentation & research 70B+
explore Fast codebase exploration 8B-34B
frontend-ui-ux-engineer UI generation 70B+
document-writer Technical writing 8B-34B
multimodal-looker Image/PDF analysis Disable if no multimodal model

Full Configuration Reference

{
  "$schema": "https://raw.githubusercontent.com/sionic-ai/oh-my-onpremcode/master/assets/oh-my-onpremcode.schema.json",
  
  "onprem": {
    "mode": "onprem",
    
    "vllm": {
      "endpoints": [
        {
          "url": "http://vllm-primary:8000/v1",
          "model": "meta-llama/Llama-3.1-70B-Instruct"
        },
        {
          "url": "http://vllm-secondary:8000/v1",
          "model": "Qwen/Qwen2.5-72B-Instruct"
        }
      ],
      "k8s_namespace": "ml-inference",
      "k8s_service_name": "vllm-gateway",
      "k8s_port": 8000,
      "auto_discover": true,
      "default_endpoint": "http://vllm-primary:8000/v1"
    },
    
    "ollama": {
      "enabled": true,
      "host": "ollama-service",
      "port": 11434,
      "models": ["llama3.1:70b", "codellama:34b", "qwen2.5:32b"]
    },
    
    "model_mapping": {
      "anthropic/claude-opus-4-5": "meta-llama/Llama-3.1-70B-Instruct",
      "anthropic/claude-sonnet-4-5": "meta-llama/Llama-3.1-8B-Instruct",
      "openai/gpt-5.2": "Qwen/Qwen2.5-72B-Instruct",
      "google/gemini-3-pro-high": "meta-llama/Llama-3.1-70B-Instruct",
      "google/gemini-3-flash": "meta-llama/Llama-3.1-8B-Instruct"
    },
    
    "branding": {
      "company_name": "Your Company",
      "welcome_message": "AI-Powered Development Environment",
      "primary_color": "#0066CC"
    },
    
    "disable_internet_features": true
  },
  
  "agents": {
    "Sisyphus": { "model": "vllm/meta-llama/Llama-3.1-70B-Instruct" },
    "oracle": { "model": "vllm/Qwen/Qwen2.5-72B-Instruct" },
    "librarian": { "model": "vllm/meta-llama/Llama-3.1-70B-Instruct" },
    "explore": { "model": "ollama/llama3.1:8b" },
    "frontend-ui-ux-engineer": { "model": "vllm/meta-llama/Llama-3.1-70B-Instruct" },
    "document-writer": { "model": "vllm/meta-llama/Llama-3.1-8B-Instruct" },
    "multimodal-looker": { "disable": true }
  },
  
  "disabled_hooks": [],
  "disabled_agents": [],
  "sisyphus_agent": { "disabled": false }
}

Deployment Guide

Prerequisites

  1. OpenCode >= 1.0.150 installed
  2. vLLM or Ollama running in your environment
  3. Network access from OpenCode to inference endpoints

Kubernetes Deployment

Example vLLM deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: vllm-llama
  namespace: ml-inference
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vllm-llama
  template:
    metadata:
      labels:
        app: vllm-llama
    spec:
      containers:
      - name: vllm
        image: vllm/vllm-openai:latest
        args:
        - --model=meta-llama/Llama-3.1-70B-Instruct
        - --tensor-parallel-size=4
        - --max-model-len=32768
        ports:
        - containerPort: 8000
        resources:
          limits:
            nvidia.com/gpu: 4
        volumeMounts:
        - name: model-cache
          mountPath: /root/.cache/huggingface
      volumes:
      - name: model-cache
        persistentVolumeClaim:
          claimName: model-cache-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: vllm-service
  namespace: ml-inference
spec:
  selector:
    app: vllm-llama
  ports:
  - port: 8000
    targetPort: 8000

Docker Compose (Development)

version: '3.8'
services:
  vllm:
    image: vllm/vllm-openai:latest
    command: --model meta-llama/Llama-3.1-8B-Instruct --max-model-len 8192
    ports:
      - "8000:8000"
    volumes:
      - ~/.cache/huggingface:/root/.cache/huggingface
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama-data:/root/.ollama

volumes:
  ollama-data:

Troubleshooting

vLLM Connection Issues

# Test vLLM endpoint
curl http://vllm-service:8000/v1/models

# Test chat completion
curl http://vllm-service:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3.1-70B-Instruct",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

# Check K8s service
kubectl get svc -n ml-inference
kubectl describe svc vllm-service -n ml-inference

# Check pod logs
kubectl logs -n ml-inference deployment/vllm-llama

Ollama Connection Issues

# Test Ollama
curl http://localhost:11434/api/tags

# List available models
ollama list

# Pull a model
ollama pull llama3.1:70b

# Test chat
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.1:70b",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": false
}'

Enable Debug Logging

# Enable debug mode
export COMMENT_CHECKER_DEBUG=1

# Check logs
tail -f /tmp/oh-my-onpremcode.log

Common Issues

Issue Solution
"No vLLM endpoint configured" Add at least one endpoint to vllm.endpoints
Connection timeout Check network access to inference service
Model not found Verify model name matches vLLM/Ollama model
Out of memory Reduce max_tokens or use smaller model

Migration from oh-my-opencode

If migrating from the original oh-my-opencode:

  1. Rename config file:

    mv ~/.config/opencode/oh-my-opencode.json ~/.config/opencode/oh-my-onpremcode.json
  2. Update plugin reference in opencode.json:

    {
      "plugin": ["oh-my-onpremcode"]
    }
  3. Add onprem configuration:

    {
      "onprem": {
        "mode": "onprem",
        "vllm": { ... }
      }
    }
  4. Update agent model references if using cloud models


Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

Acknowledgments

Based on oh-my-opencode by @code-yeongyu.


Maintained by Sionic AI

About

Steroids for your onprem code.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 100.0%