honcho-embed-rp

Honcho Embed Reverse Proxy is a lightweight HTTP reverse proxy that intercepts OpenAI-compatible /v1/embeddings API requests. It automatically rewrites the model name and adds a dimensions parameter (default: 1536) before forwarding requests to the backend embedding server.

Honcho Integration Trick

This proxy implements the workaround described in plastic-labs/honcho#404 for running fully local Honcho deployments with custom embedding models.

The Problem

Honcho's embedding configuration has several hardcodes:

Only supports openai, gemini, or openrouter providers for custom base URLs
When using openrouter provider, model name is hardcoded to openai/text-embedding-3-large
Database schema and code expect exactly 1536-dimensional embeddings

The Solution

This reverse proxy sits between Honcho and your embedding server (e.g., vLLM), performing these transformations:

Model name rewriting: Honcho requests openai/text-embedding-3-large → proxy rewrites to your actual model (e.g., Qwen/Qwen3-Embedding-4B)
Dimensions injection: Adds dimensions: 1536 to all requests (configurable, default: 1536 for Honcho compatibility)
Response model fixing: Rewrites model name back in responses so Honcho sees what it expects

This allows you to use any embedding model with Honcho without modifying vLLM's --served-model-name or Honcho's source code.

Core Functionality

This proxy's primary purpose is to:

Intercept /v1/embeddings requests from clients
Rewrite the model name from the client-facing model name to the actual backend model name
Add dimensions parameter set to 1536 to all embedding requests
Restore the original model name in the response before sending it back to the client
Pass through all other requests unchanged to the backend

Installation

Requirements: Go 1.24.2 or later

go build -o honcho-embed-rp .

Usage

./honcho-embed-rp \
  -target "http://127.0.0.1:8000" \
  -served-model "Qwen/Qwen3-Embedding-4B"

Or using environment variables:

export HONCHOEMBEDRP_TARGET="http://127.0.0.1:8000"
export HONCHOEMBEDRP_SERVED_MODEL_NAME="Qwen/Qwen3-Embedding-4B"
./honcho-embed-rp

Configuration

Configure the proxy using command-line flags or environment variables:

Flag	Environment Variable	Default	Description
`-listen`	`HONCHOEMBEDRP_LISTEN`	`0.0.0.0`	IP address to listen on
`-port`	`HONCHOEMBEDRP_PORT`	`9000`	Port to listen on
`-target`	`HONCHOEMBEDRP_TARGET`	`http://127.0.0.1:8000`	Backend target URL
`-loglevel`	`HONCHOEMBEDRP_LOGLEVEL`	`INFO`	Log level (COMPLETE, DEBUG, INFO, WARN, ERROR)
`-served-model`	`HONCHOEMBEDRP_SERVED_MODEL_NAME`	(required)	Backend model name to use in outgoing requests
`-dimensions`	`HONCHOEMBEDRP_DIMENSIONS`	`1536`	Embedding dimensions (1536 for Honcho compatibility)

Request Routing

POST /v1/embeddings: Transformed (model name rewritten, dimensions=1536 added)
GET /health: Health check endpoint (returns {"status":"healthy"})
All other paths: Passed through unchanged to the backend

Example Request/Response

Standard Usage

Client Request (Honcho sends this):

POST /v1/embeddings
{
  "model": "openai/text-embedding-3-large",
  "input": "Hello, world!"
}

Backend Request (after transformation):

POST /v1/embeddings
{
  "model": "Qwen/Qwen3-Embedding-4B",
  "input": "Hello, world!",
  "dimensions": 1536
}

Client Response (model name restored for Honcho):

{
  "object": "list",
  "data": [...],
  "model": "openai/text-embedding-3-large",
  "usage": {...}
}

Honcho Integration Example

vLLM embedding server (Docker Compose):

services:
  vllm-embedding:
    image: vllm/vllm-openai:latest
    command:
      - Qwen/Qwen3-Embedding-4B
      - --port
      - "8000"
      - --gpu-memory-utilization
      - "0.5"
      - --hf-overrides
      - '{"is_matryoshka": true, "matryoshka_dimensions": [1536]}'
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

honcho-embed-rp (Docker Compose):

services:
  honcho-embed-rp:
    image: honcho-embed-rp:latest
    environment:
      - HONCHOEMBEDRP_TARGET=http://vllm-embedding:8000
      - HONCHOEMBEDRP_SERVED_MODEL_NAME=Qwen/Qwen3-Embedding-4B
      - HONCHOEMBEDRP_DIMENSIONS=1536
    ports:
      - "9000:9000"

Honcho environment variables:

# Point OpenAI-compatible provider to the proxy
LLM_OPENAI_COMPATIBLE_BASE_URL=http://honcho-embed-rp:9000/v1
LLM_OPENAI_COMPATIBLE_API_KEY=sk-no-key-required

# Use openrouter provider (supports custom base URL)
# Honcho will request model: openai/text-embedding-3-large
LLM_EMBEDDING_PROVIDER=openrouter

# vLLM provider for LLM calls (separate endpoint)
DERIVER_PROVIDER=vllm
DERIVER_MODEL="your-llm-model-name"
LLM_VLLM_BASE_URL=http://your-vllm-llm:8000/v1
LLM_VLLM_API_KEY=your-api-key

With this setup:

Honcho requests embeddings from openai/text-embedding-3-large at the proxy (hardcoded by openrouter provider)
Proxy rewrites to Qwen/Qwen3-Embedding-4B with dimensions: 1536
vLLM serves the Qwen model with Matryoshka embeddings at 1536 dimensions
Response model name is rewritten back to openai/text-embedding-3-large for Honcho compatibility

Health Check

GET /health: Returns {"status":"healthy"} for Docker health checks

Log Levels

The proxy supports the following log levels:

Level	Description
`COMPLETE`	Most verbose - includes full HTTP request/response dumps
`DEBUG`	Debug information including parameter application details
`INFO`	General operational information
`WARN`	Warning messages
`ERROR`	Error messages only

When set to COMPLETE, the proxy will log full HTTP request and response bodies, which is useful for debugging but very verbose.

⚠️ Privacy Warning: Embedding requests may contain sensitive or personal data. The COMPLETE log level will expose all this data in plaintext. Only enable it in secure, non-production environments or ensure logs are properly secured and retained temporarily.

License

MIT License - see LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.go		config.go
embeddings.go		embeddings.go
go.mod		go.mod
go.sum		go.sum
helpers.go		helpers.go
main.go		main.go
passthrough.go		passthrough.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

honcho-embed-rp

Honcho Integration Trick

The Problem

The Solution

Core Functionality

Installation

Usage

Configuration

Request Routing

Example Request/Response

Standard Usage

Honcho Integration Example

Health Check

Log Levels

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

honcho-embed-rp

Honcho Integration Trick

The Problem

The Solution

Core Functionality

Installation

Usage

Configuration

Request Routing

Example Request/Response

Standard Usage

Honcho Integration Example

Health Check

Log Levels

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages