Skip to content

Latest commit

 

History

History
395 lines (305 loc) · 22.7 KB

File metadata and controls

395 lines (305 loc) · 22.7 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Commands

# Install required tools (one-time)
go install gotest.tools/gotestsum@latest
go install github.com/swaggo/swag/cmd/swag@latest
# Atlas CLI: https://atlasgo.io/docs

# Run locally (auto-applies migrations on startup)
go run ./cmd/api

# Run tests
make test                          # pretty output via gotestsum
make test-verbose
go test ./internal/modules/ai/...  # single package
go test -run TestName ./internal/modules/router/application/...  # single test

# Build Lambda binaries
make build-ApiFunction
make build-WorkerFunction

# Regenerate Swagger docs (after changing handlers or DTOs)
make swagger

# Database migrations (Atlas)
make migrate-apply                 # apply pending migrations locally
make migrate-diff name=add_col     # generate new migration from GORM models
make migrate-status                # show applied vs pending
make migrate-hash                  # rehash after manual edits

# Production migrations (requires DATABASE_URL + DATABASE_DEV_URL env vars)
make migrate-apply-prod
make migrate-diff-prod name=add_col
make migrate-status-prod

# Seed dev database
make seed-inference-logs           # insert 200 synthetic inference_log rows over 30 days (idempotent)

Default local port is 8090 (not 8080 — see config.go).

API at http://localhost:8090 · Swagger UI at http://localhost:8090/swagger/index.html · Health check at GET /healthz.

Config loads .env then .env.local (override) at startup via godotenv; both files are silently ignored in production.

Swagger annotations

Swagger docs are generated by swag from godoc comments directly above each handler function. The global API metadata lives in cmd/api/main.go. After adding or changing any handler or DTO, run make swagger to regenerate docs/.

Global metadata (cmd/api/main.go)

// @title AI Proxy API
// @version 1.0
// @description ...
// @BasePath /
// @securityDefinitions.apikey BearerAuth
// @in header
// @name Authorization

BearerAuth is the only security scheme. It covers both session tokens (JWT) and API keys — the middleware decides which it accepts, not the swagger definition.

Per-handler annotation shape

Every handler follows this exact structure:

// FunctionName godoc
// @Summary     One-line summary (shown in the Swagger UI list)
// @Description Longer description — omit if the summary is already clear
// @Tags        hyperstrate
// @Tags        <feature-tag>     ← one of: auth | catalog | models | routers | jobs | conversations | observability
// @Accept      json              ← only on POST/PUT/PATCH that read a body
// @Produce     json              ← always present unless the response is 204
// @Param       <name>  <in>  <type>  <required>  "<description>"
// @Success     <code>  {object|array}  <ResponseType>
// @Failure     <code>  {object}        ErrorResponse
// @Security    BearerAuth             ← omit only on public endpoints (no auth required)
// @Router      /path/{param} [method]

Always use two @Tags lines: hyperstrate (groups all routes together in the UI) and the feature-specific tag.

@Param syntax

Location Example
Path // @Param id path string true "Router ID"
Query // @Param page query int false "Page number (default 1)"
Query // @Param perPage query int false "Items per page (default 30, max 500)"
Body // @Param body body application.CreateFooInput true "Foo input"

@Success / @Failure response types

Pattern When to use
{object} application.FooResponse single JSON object
{array} domain.FooEntity bare JSON array (not paginated)
{object} pagination.Paginated[application.FooResponse] paginated list — use {object} not {array}
// @Success 204 no body (DELETE)
{object} ErrorResponse all @Failure lines

Auth security by route group

Each module's module.go registers routes in groups with different middleware. Use @Security BearerAuth accordingly:

Group registered in Middleware @Security
RegisterPublicRoutes none omit @Security
RegisterSessionRoutes RequireSession — valid JWT, any role // @Security BearerAuth
RegisterAdminRoutes RequireAdmin — valid JWT + admin role // @Security BearerAuth
RegisterInferRoutes InferAuth — API key or session JWT // @Security BearerAuth

Complete examples

Public endpoint (no auth):

// GetSetupStatus godoc
// @Summary     Check if initial setup is required
// @Tags        hyperstrate
// @Tags        auth
// @Produce     json
// @Success     200  {object}  application.SetupStatusResponse
// @Router      /auth/setup/status [get]

Admin endpoint with body + path param:

// UpdateOrganization godoc
// @Summary     Update an organisation
// @Tags        hyperstrate
// @Tags        auth
// @Accept      json
// @Produce     json
// @Param       orgId  path  string                              true  "Org ID"
// @Param       body   body  application.UpdateOrganizationInput true  "Fields to update"
// @Success     200    {object}  application.OrganizationResponse
// @Failure     400    {object}  ErrorResponse
// @Failure     404    {object}  ErrorResponse
// @Security    BearerAuth
// @Router      /auth/organizations/{orgId} [patch]

Paginated list with optional query filter:

// ListAPIKeys godoc
// @Summary     List API keys
// @Tags        hyperstrate
// @Tags        auth
// @Produce     json
// @Param       routerId  query  string  false  "Filter by router ID"
// @Param       page      query  int     false  "Page number (default 1)"
// @Param       perPage   query  int     false  "Items per page (default 30, max 500)"
// @Success     200  {object}  pagination.Paginated[application.APIKeyResponse]
// @Security    BearerAuth
// @Router      /auth/api-keys [get]

Delete (204 no body):

// DeleteOrganization godoc
// @Summary     Delete an organisation
// @Tags        hyperstrate
// @Tags        auth
// @Param       orgId  path  string  true  "Org ID"
// @Success     204
// @Failure     404  {object}  ErrorResponse
// @Security    BearerAuth
// @Router      /auth/organizations/{orgId} [delete]

Architecture

Go 1.25, Gin, Fx (dependency injection), GORM, Atlas migrations, AWS Lambda + SQS.

Module structure

Every feature module lives under internal/modules/<name>/ and follows the same layout:

domain/             entities, repository interfaces, domain errors
application/        use-case service, DTOs, event types
infrastructure/
  persistence/      GORM repositories implementing domain interfaces
  proxy/ or vault/  external integrations
interfaces/http/    Gin handlers
module.go           Fx wiring: provides dependencies, invokes route registration

internal/app/app.go composes all modules into three Fx app factories:

  • NewHTTPApp() — local dev server
  • NewLambdaApp() — API Gateway Lambda handler
  • NewWorkerApp() — SQS worker Lambda (minimal: no HTTP, no router, no auth)

Modules

ai — model catalog, registrations, inference, async jobs, conversations, MCP servers.

  • domain/catalog.go: static map of all supported models baked into the binary. Adding a model = adding an entry here; no DB change needed.
  • A registration (domain/model.go) links a catalog key to a DB row and API key config.
  • Inference dispatches via application.JobDispatcher: goroutine locally, SQS when SQS_QUEUE_URL is set (selected at Fx startup in module.go:newJobDispatcher).
  • Conversations (domain/conversation.go) — multi-turn chat sessions stored in the DB; CRUD at /ai/conversations, messages at /ai/conversations/:id/messages.
  • MCP servers (domain/mcp_server.go) — registered external tool servers used by the router mcp_tools feature; CRUD at /ai/mcp/servers.
  • Routes split into two groups: /ai admin (session token + admin role) and /ai infer (API key or session).

router — named routers that proxy inference through a configurable pipeline.

  • A router has targets (model registrations), features (pipeline stages), interceptors, and a 1:1 RouterConfiguration row.
  • RouterConfiguration fields: WebhookURL (event notifications), PromptID (linked prompt template), StorePayloads (persist raw request+response to inference_payloads).
  • application/pipeline.go runs the full request pipeline: rate limit → budget → cache → field transforms → interceptors → target selection → inference → budget accounting → cache store. Additional stages (hedging, quality gate, MCP tool dispatch, semantic memory) activate when the corresponding features are enabled.
  • Routing strategies: round_robin, weighted, percentage, failover, random, latency_based.
  • Feature types: response_cache, semantic_cache, retry, fallback, token_optimization, context_trimming, rate_limit, budget, mcp_tools, health_check, prompt_caching, structured_output, request_coalescing, hedging, quality_gate, context_compression, semantic_memory, cost_aware_routing, response_prefetch, response_fingerprinting.
  • Interceptor types: semantic_classifier, content_filter, pii_detector, prompt_guard, ab_test, prompt_shield, team_budget.
  • Semantic features (embedding-based cache, classifier, semantic memory) degrade gracefully when no EmbeddingProvider is wired; currently a noopEmbedder is provided.
  • Routes split into three groups: RegisterCRUDRoutes (admin session required — CRUD for routers, targets, features, interceptors, evaluations), RegisterInferRoutes (InferAuth — /router/:id/infer, /router/:id/v1/chat/completions, /router/:id/v1/messages), and RegisterProxyRoutes (InferAuth — /proxy/router/:id/*path, usable as an OpenAI or Anthropic SDK baseURL).
  • application/service_eval.go manages evaluation sets: named collections of test cases with exact, contains, or llm scoring, run as scored RouterEvaluationRun records.

prompts — named, versioned system-prompt templates.

  • A Prompt has {{variable}} placeholders extracted on save and stored in a variables JSON column.
  • Every save creates an immutable PromptVersion snapshot; full version history with restore is supported.
  • Prompts can be attached to a router via router.PromptID; the pipeline loads and interpolates the prompt before inference.
  • All routes require admin session (RequireAdmin middleware).

auth — organizations, users, teams, API keys, virtual keys, OIDC login.

  • Two middleware types used across all modules: RequireAdmin(sessionValidator) and InferAuth(keyValidator) (accepts API key or session).
  • JWT secret comes from JWT_SECRET env var; there's a hardcoded insecure fallback for dev.
  • vault.Provider is a NoopProvider by default; swap it to store API keys in a real vault.
  • VirtualKey supports optional spending/request budgets with ResetPeriod (daily, weekly, monthly) and per-key RateLimitRPS (token-bucket, in-process, not persisted).
  • Team tracks aggregate spending (UsedRequests, UsedCostUSD) with optional MaxRequests/MaxCostUSD caps.
  • RouterTeamAccess rows restrict a router to specific teams; when no rows exist the router is open to all authenticated callers.

observability — inference logs, health monitoring.

  • observability/module.go registers listeners on ai/application.InferenceEventBus and router/application.RouterInferenceEventBus to persist inference logs. The observability module owns this wiring — emitting modules know nothing about it.

Cross-module communication: event buses

When one module needs to react to something that happened in another module, use an event bus — never a direct interface dependency between modules.

Pattern (see ai/application/events.go for the canonical example):

  1. The emitting module defines the event type and a typed bus in its application/events.go:
    type ThingHappenedEvent struct { ... }
    type ThingHappenedListener func(e ThingHappenedEvent)
    type ThingEventBus struct { listeners []ThingHappenedListener }
    func NewThingEventBus() *ThingEventBus { ... }
    func (b *ThingEventBus) OnHappened(l ThingHappenedListener) { ... }
    func (b *ThingEventBus) Emit(e ThingHappenedEvent) { ... }
  2. The emitting module provides the bus via fx.Provide(application.NewThingEventBus) in its module.go and calls bus.Emit(...) at the right moment in its service.
  3. The consuming module registers a listener in its own module.go via fx.Invoke:
    func registerThingListeners(bus *emitterApp.ThingEventBus, svc application.Service) {
        bus.OnHappened(func(e emitterApp.ThingHappenedEvent) { svc.Handle(e) })
    }

Rules:

  • Emitting modules must not import consuming modules — the bus is the only coupling.
  • The consuming module owns the listener registration, not the emitter.
  • Use context.Context in listener signatures only when the listener needs to propagate cancellation. Fire-and-forget listeners (e.g. observability) take the event value directly and must not block.
  • Buses run listeners synchronously in registration order. If a listener can be slow, have it spawn a goroutine internally.
  • Existing buses: ai/application.ModelEventBus (model lifecycle), ai/application.InferenceEventBus (direct inference calls), ai/application.MCPServerEventBus (MCP server lifecycle), router/application.RouterInferenceEventBus (routed inference calls), router/application.RouterTargetEventBus (target deletion), prompts/application.PromptEventBus (prompt lifecycle).

Cross-module synchronous queries (adapter pattern)

For synchronous reads from another module (not fire-and-forget events), define a narrow interface in your own application/ layer and implement a private adapter in module.go that delegates to the other module's service. This keeps modules decoupled while satisfying compile-time dependencies through Fx.

Example from router/module.go: PromptLoader, HealthChecker, BudgetQuerier, MCPServerLoader, and ModelLookup are interfaces defined in router/application/ and fulfilled by private adapter structs (e.g. promptLoaderAdapter, obsHealthAdapter) in router/module.go that wrap services from the prompts, observability, and ai modules respectively.

The adapter is wired via fx.Provide in the consuming module's Module():

fx.Provide(newPromptLoaderAdapter)  // returns application.PromptLoader

func newPromptLoaderAdapter(svc promptsApp.Service) application.PromptLoader {
    return &promptLoaderAdapter{svc: svc}
}

Input validation

Validation is the handler's responsibility; services trust their inputs.

Body DTOs (Gin binding tags)

Use binding tags on input structs — Gin's validator runs automatically on ShouldBindJSON. The handler calls respondBindError(c, err, &input) on failure.

Scenario Tag
Required string binding:"required"
Required string with length cap binding:"required,max=255"
Optional URL binding:"omitempty,url"
Required URL binding:"required,url"
Percentage (0–100) binding:"min=0,max=100"
Non-negative integer/float binding:"min=0"
Enum field validate in the handler after binding, or add a custom oneof=val1 val2 tag

Specific gaps to fix when touching these DTOs:

  • AddConversationMessageInput.Role — add binding:"required,oneof=user assistant"
  • AddTargetInput.Percentage — add binding:"min=0,max=100"
  • CreateVirtualKeyInput.MaxRequests / MaxCostUSD and CreateTeamInput.MaxRequests / MaxCostUSD — add binding:"min=0"
  • SubmitJobRequest.CallbackURL — add binding:"omitempty,url"
  • All Name string fields — add max=255 (e.g. binding:"required,max=255")

Path parameters

No path param validation exists today. Every handler passes c.Param("id") straight to the service. When adding or editing a handler that reads a path param, validate it explicitly before calling the service:

id := c.Param("id")
if id == "" || len(id) > 100 {
    c.JSON(http.StatusBadRequest, gin.H{"error": "invalid id"})
    return
}

A shared validateParam(c *gin.Context, name string) (string, bool) helper in each module's handler file is the preferred pattern (returns the value and false + writes the 400 if invalid).

Query parameters

  • Filter IDs passed as query params (e.g. routerId, status) must be at most 100 chars; reject or ignore longer values.
  • parseDateRangeOptional() in the observability handler silently returns nil on a bad date — it should return an explicit 400 instead.
  • Numeric query params with explicit range constraints (limit, offset) should clamp or reject out-of-range values consistently. Current pattern (observability limit): values ≤ 0 or > max fall back to the default. Apply this pattern to offset as well.

Error responses

ErrorResponse{Error string, Fields map[string][]string} is the standard error envelope (defined locally in each module's handler file, not shared). Fields is populated only for binding validation errors via validation.BindingErrors from internal/shared/validation.

Each module's handler file has its own respondError(c, err) and respondBindError(c, err, &input) — intentionally duplicated because respondError switches on that module's own domain sentinel errors to map them to the correct HTTP status code.

Shared utilities

internal/shared/ contains packages used across modules — do not import one module from another; use these instead.

  • dbtype — dialect-aware GORM column types for JSON columns. Use dbtype.JSONMap (map[string]any), dbtype.JSONStringMap (map[string]string), or dbtype.JSONStringSlice ([]string) on GORM struct fields that need JSON storage. They serialize as jsonb on PostgreSQL and text on SQLite automatically.
  • templatetemplate.Interpolate(text, vars) replaces {{key}} placeholders; template.ExtractVariables(text) returns the unique set of placeholder names. Used by the prompts module; reuse here rather than reimplementing.
  • validationvalidation.BindingErrors(err, input) converts a validator.ValidationErrors into a human-readable summary and a map[string][]string of field-level messages keyed by JSON field name.
  • paginationpagination.ParseSlice(c) reads page/perPage query params; pagination.New(items, total, slice) wraps results in pagination.Paginated[T].
  • audit / webhook — global singletons set once at startup by the observability module. audit.Log(...) and webhook.Send(...) are no-ops until the logger/recorder is registered — they silently do nothing outside a full NewHTTPApp() / NewLambdaApp() context (e.g. in unit tests).
  • httpserverhttpserver.NewRouter(cfg) creates the shared Gin engine with CORS (origin from FRONTEND_URL), GET /healthz, Swagger UI (GET /swagger/*), and structured slog request logging (5xx→Error, 4xx→Warn, 2xx→Info, healthz suppressed). Provided by app.go — all module handlers attach their routes to this engine.
  • loggerlogger.Init() configures a colored slog handler (via tint) as the global default. Colors are disabled automatically when stderr is not a TTY (CI, Lambda). Called once from cmd/api/main.go at startup.

Metrics endpoint

GET /metrics returns Prometheus text format. It is registered in internal/app/app.go by registerMetricsEndpoint and aggregates all metrics.Collector implementations. Currently only router pipeline metrics are collected (request counts + average latency per router).

Adding a new module

  1. Create internal/modules/<name>/ with the standard layout above.
  2. Export func Module() fx.Option in module.go.
  3. Import and add <name>.Module() in internal/app/app.go.
  4. Register routes inside your module.go via fx.Invoke.

Database migrations

Migrations live in internal/db/migrations/ with separate sqlite/ and postgres/ sub-directories. Atlas drives them; atlas.hcl defines local and production environments. All migration files are embedded into the binary at compile time via embed.FS — no external migration files are needed at runtime.

The app auto-applies pending migrations at startup for local dev — make migrate-apply is only needed if you want to apply without starting the server.

After changing any GORM struct that maps to a DB table, run make migrate-diff name=<description> to generate a new versioned SQL file, then run make migrate-hash if you edit it manually.

SQLite note: the DB layer forces WAL journal mode and SetMaxOpenConns(1) — concurrent writers will queue, not error.

cmd/migrate

Standalone binary that registers GORM models with Atlas so make migrate-diff can generate SQL from struct changes. Not invoked directly — Atlas calls it via the atlas.hcl data source.

Testing conventions

Tests operate at the application layer using hand-rolled stub repositories — no test database, no integration harness. Service tests live in package application_test (external); pipeline tests live in package application (internal, so they can reach unexported pipeline helpers).

Each module has an org_security_test.go file that verifies org isolation: stubs return ErrNotFound when the orgID doesn't match the resource owner, and tests confirm the service propagates orgID through every repo call. Add a case here whenever you add a service method that takes orgID.

Environment variables

Variable Default Notes
PORT 8090 Local HTTP port
DATABASE_DSN SQLite file Override to a PostgreSQL DSN in prod
JWT_SECRET insecure default Must be set in production
ADMIN_EMAIL User with this email always gets admin role
SQS_QUEUE_URL Set to use SQS dispatcher instead of goroutines
FRONTEND_URL http://localhost:8080 OIDC redirect target
OIDC_JWKS_URL Required for OIDC token exchange
OIDC_PROVIDERS Comma-separated list (e.g. google,github)
CACHE_BACKEND memory memory or redis
CACHE_REDIS_ADDR Required when CACHE_BACKEND=redis (e.g. localhost:6379)
CACHE_REDIS_PREFIX Optional key namespace for the Redis cache
RATE_LIMIT_BACKEND memory memory or redis
APP_ENV development Set to production to require JWT_SECRET
LOG_RETENTION_DAYS 90 Days before inference/audit logs are purged
HEALTH_CHECK_INTERVAL_SECS 120 Interval between provider health probes