This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
# Install required tools (one-time)
go install gotest.tools/gotestsum@latest
go install github.com/swaggo/swag/cmd/swag@latest
# Atlas CLI: https://atlasgo.io/docs
# Run locally (auto-applies migrations on startup)
go run ./cmd/api
# Run tests
make test # pretty output via gotestsum
make test-verbose
go test ./internal/modules/ai/... # single package
go test -run TestName ./internal/modules/router/application/... # single test
# Build Lambda binaries
make build-ApiFunction
make build-WorkerFunction
# Regenerate Swagger docs (after changing handlers or DTOs)
make swagger
# Database migrations (Atlas)
make migrate-apply # apply pending migrations locally
make migrate-diff name=add_col # generate new migration from GORM models
make migrate-status # show applied vs pending
make migrate-hash # rehash after manual edits
# Production migrations (requires DATABASE_URL + DATABASE_DEV_URL env vars)
make migrate-apply-prod
make migrate-diff-prod name=add_col
make migrate-status-prod
# Seed dev database
make seed-inference-logs # insert 200 synthetic inference_log rows over 30 days (idempotent)Default local port is 8090 (not 8080 — see config.go).
API at http://localhost:8090 · Swagger UI at http://localhost:8090/swagger/index.html · Health check at GET /healthz.
Config loads .env then .env.local (override) at startup via godotenv; both files are silently ignored in production.
Swagger docs are generated by swag from godoc comments directly above each handler function. The global API metadata lives in cmd/api/main.go. After adding or changing any handler or DTO, run make swagger to regenerate docs/.
// @title AI Proxy API
// @version 1.0
// @description ...
// @BasePath /
// @securityDefinitions.apikey BearerAuth
// @in header
// @name AuthorizationBearerAuth is the only security scheme. It covers both session tokens (JWT) and API keys — the middleware decides which it accepts, not the swagger definition.
Every handler follows this exact structure:
// FunctionName godoc
// @Summary One-line summary (shown in the Swagger UI list)
// @Description Longer description — omit if the summary is already clear
// @Tags hyperstrate
// @Tags <feature-tag> ← one of: auth | catalog | models | routers | jobs | conversations | observability
// @Accept json ← only on POST/PUT/PATCH that read a body
// @Produce json ← always present unless the response is 204
// @Param <name> <in> <type> <required> "<description>"
// @Success <code> {object|array} <ResponseType>
// @Failure <code> {object} ErrorResponse
// @Security BearerAuth ← omit only on public endpoints (no auth required)
// @Router /path/{param} [method]Always use two @Tags lines: hyperstrate (groups all routes together in the UI) and the feature-specific tag.
| Location | Example |
|---|---|
| Path | // @Param id path string true "Router ID" |
| Query | // @Param page query int false "Page number (default 1)" |
| Query | // @Param perPage query int false "Items per page (default 30, max 500)" |
| Body | // @Param body body application.CreateFooInput true "Foo input" |
| Pattern | When to use |
|---|---|
{object} application.FooResponse |
single JSON object |
{array} domain.FooEntity |
bare JSON array (not paginated) |
{object} pagination.Paginated[application.FooResponse] |
paginated list — use {object} not {array} |
// @Success 204 |
no body (DELETE) |
{object} ErrorResponse |
all @Failure lines |
Each module's module.go registers routes in groups with different middleware. Use @Security BearerAuth accordingly:
| Group registered in | Middleware | @Security |
|---|---|---|
RegisterPublicRoutes |
none | omit @Security |
RegisterSessionRoutes |
RequireSession — valid JWT, any role |
// @Security BearerAuth |
RegisterAdminRoutes |
RequireAdmin — valid JWT + admin role |
// @Security BearerAuth |
RegisterInferRoutes |
InferAuth — API key or session JWT |
// @Security BearerAuth |
Public endpoint (no auth):
// GetSetupStatus godoc
// @Summary Check if initial setup is required
// @Tags hyperstrate
// @Tags auth
// @Produce json
// @Success 200 {object} application.SetupStatusResponse
// @Router /auth/setup/status [get]Admin endpoint with body + path param:
// UpdateOrganization godoc
// @Summary Update an organisation
// @Tags hyperstrate
// @Tags auth
// @Accept json
// @Produce json
// @Param orgId path string true "Org ID"
// @Param body body application.UpdateOrganizationInput true "Fields to update"
// @Success 200 {object} application.OrganizationResponse
// @Failure 400 {object} ErrorResponse
// @Failure 404 {object} ErrorResponse
// @Security BearerAuth
// @Router /auth/organizations/{orgId} [patch]Paginated list with optional query filter:
// ListAPIKeys godoc
// @Summary List API keys
// @Tags hyperstrate
// @Tags auth
// @Produce json
// @Param routerId query string false "Filter by router ID"
// @Param page query int false "Page number (default 1)"
// @Param perPage query int false "Items per page (default 30, max 500)"
// @Success 200 {object} pagination.Paginated[application.APIKeyResponse]
// @Security BearerAuth
// @Router /auth/api-keys [get]Delete (204 no body):
// DeleteOrganization godoc
// @Summary Delete an organisation
// @Tags hyperstrate
// @Tags auth
// @Param orgId path string true "Org ID"
// @Success 204
// @Failure 404 {object} ErrorResponse
// @Security BearerAuth
// @Router /auth/organizations/{orgId} [delete]Go 1.25, Gin, Fx (dependency injection), GORM, Atlas migrations, AWS Lambda + SQS.
Every feature module lives under internal/modules/<name>/ and follows the same layout:
domain/ entities, repository interfaces, domain errors
application/ use-case service, DTOs, event types
infrastructure/
persistence/ GORM repositories implementing domain interfaces
proxy/ or vault/ external integrations
interfaces/http/ Gin handlers
module.go Fx wiring: provides dependencies, invokes route registration
internal/app/app.go composes all modules into three Fx app factories:
NewHTTPApp()— local dev serverNewLambdaApp()— API Gateway Lambda handlerNewWorkerApp()— SQS worker Lambda (minimal: no HTTP, no router, no auth)
ai — model catalog, registrations, inference, async jobs, conversations, MCP servers.
domain/catalog.go: static map of all supported models baked into the binary. Adding a model = adding an entry here; no DB change needed.- A registration (
domain/model.go) links a catalog key to a DB row and API key config. - Inference dispatches via
application.JobDispatcher: goroutine locally, SQS whenSQS_QUEUE_URLis set (selected at Fx startup inmodule.go:newJobDispatcher). - Conversations (
domain/conversation.go) — multi-turn chat sessions stored in the DB; CRUD at/ai/conversations, messages at/ai/conversations/:id/messages. - MCP servers (
domain/mcp_server.go) — registered external tool servers used by the routermcp_toolsfeature; CRUD at/ai/mcp/servers. - Routes split into two groups:
/aiadmin (session token + admin role) and/aiinfer (API key or session).
router — named routers that proxy inference through a configurable pipeline.
- A router has targets (model registrations), features (pipeline stages), interceptors, and a 1:1
RouterConfigurationrow. RouterConfigurationfields:WebhookURL(event notifications),PromptID(linked prompt template),StorePayloads(persist raw request+response toinference_payloads).application/pipeline.goruns the full request pipeline: rate limit → budget → cache → field transforms → interceptors → target selection → inference → budget accounting → cache store. Additional stages (hedging, quality gate, MCP tool dispatch, semantic memory) activate when the corresponding features are enabled.- Routing strategies:
round_robin,weighted,percentage,failover,random,latency_based. - Feature types:
response_cache,semantic_cache,retry,fallback,token_optimization,context_trimming,rate_limit,budget,mcp_tools,health_check,prompt_caching,structured_output,request_coalescing,hedging,quality_gate,context_compression,semantic_memory,cost_aware_routing,response_prefetch,response_fingerprinting. - Interceptor types:
semantic_classifier,content_filter,pii_detector,prompt_guard,ab_test,prompt_shield,team_budget. - Semantic features (embedding-based cache, classifier, semantic memory) degrade gracefully when no
EmbeddingProvideris wired; currently anoopEmbedderis provided. - Routes split into three groups:
RegisterCRUDRoutes(admin session required — CRUD for routers, targets, features, interceptors, evaluations),RegisterInferRoutes(InferAuth —/router/:id/infer,/router/:id/v1/chat/completions,/router/:id/v1/messages), andRegisterProxyRoutes(InferAuth —/proxy/router/:id/*path, usable as an OpenAI or Anthropic SDKbaseURL). application/service_eval.gomanages evaluation sets: named collections of test cases withexact,contains, orllmscoring, run as scoredRouterEvaluationRunrecords.
prompts — named, versioned system-prompt templates.
- A
Prompthas{{variable}}placeholders extracted on save and stored in avariablesJSON column. - Every save creates an immutable
PromptVersionsnapshot; full version history with restore is supported. - Prompts can be attached to a router via
router.PromptID; the pipeline loads and interpolates the prompt before inference. - All routes require admin session (
RequireAdminmiddleware).
auth — organizations, users, teams, API keys, virtual keys, OIDC login.
- Two middleware types used across all modules:
RequireAdmin(sessionValidator)andInferAuth(keyValidator)(accepts API key or session). - JWT secret comes from
JWT_SECRETenv var; there's a hardcoded insecure fallback for dev. vault.Provideris aNoopProviderby default; swap it to store API keys in a real vault.VirtualKeysupports optional spending/request budgets withResetPeriod(daily,weekly,monthly) and per-keyRateLimitRPS(token-bucket, in-process, not persisted).Teamtracks aggregate spending (UsedRequests,UsedCostUSD) with optionalMaxRequests/MaxCostUSDcaps.RouterTeamAccessrows restrict a router to specific teams; when no rows exist the router is open to all authenticated callers.
observability — inference logs, health monitoring.
observability/module.goregisters listeners onai/application.InferenceEventBusandrouter/application.RouterInferenceEventBusto persist inference logs. The observability module owns this wiring — emitting modules know nothing about it.
When one module needs to react to something that happened in another module, use an event bus — never a direct interface dependency between modules.
Pattern (see ai/application/events.go for the canonical example):
- The emitting module defines the event type and a typed bus in its
application/events.go:type ThingHappenedEvent struct { ... } type ThingHappenedListener func(e ThingHappenedEvent) type ThingEventBus struct { listeners []ThingHappenedListener } func NewThingEventBus() *ThingEventBus { ... } func (b *ThingEventBus) OnHappened(l ThingHappenedListener) { ... } func (b *ThingEventBus) Emit(e ThingHappenedEvent) { ... }
- The emitting module provides the bus via
fx.Provide(application.NewThingEventBus)in itsmodule.goand callsbus.Emit(...)at the right moment in its service. - The consuming module registers a listener in its own
module.goviafx.Invoke:func registerThingListeners(bus *emitterApp.ThingEventBus, svc application.Service) { bus.OnHappened(func(e emitterApp.ThingHappenedEvent) { svc.Handle(e) }) }
Rules:
- Emitting modules must not import consuming modules — the bus is the only coupling.
- The consuming module owns the listener registration, not the emitter.
- Use
context.Contextin listener signatures only when the listener needs to propagate cancellation. Fire-and-forget listeners (e.g. observability) take the event value directly and must not block. - Buses run listeners synchronously in registration order. If a listener can be slow, have it spawn a goroutine internally.
- Existing buses:
ai/application.ModelEventBus(model lifecycle),ai/application.InferenceEventBus(direct inference calls),ai/application.MCPServerEventBus(MCP server lifecycle),router/application.RouterInferenceEventBus(routed inference calls),router/application.RouterTargetEventBus(target deletion),prompts/application.PromptEventBus(prompt lifecycle).
For synchronous reads from another module (not fire-and-forget events), define a narrow interface in your own application/ layer and implement a private adapter in module.go that delegates to the other module's service. This keeps modules decoupled while satisfying compile-time dependencies through Fx.
Example from router/module.go: PromptLoader, HealthChecker, BudgetQuerier, MCPServerLoader, and ModelLookup are interfaces defined in router/application/ and fulfilled by private adapter structs (e.g. promptLoaderAdapter, obsHealthAdapter) in router/module.go that wrap services from the prompts, observability, and ai modules respectively.
The adapter is wired via fx.Provide in the consuming module's Module():
fx.Provide(newPromptLoaderAdapter) // returns application.PromptLoader
func newPromptLoaderAdapter(svc promptsApp.Service) application.PromptLoader {
return &promptLoaderAdapter{svc: svc}
}Validation is the handler's responsibility; services trust their inputs.
Use binding tags on input structs — Gin's validator runs automatically on ShouldBindJSON. The handler calls respondBindError(c, err, &input) on failure.
| Scenario | Tag |
|---|---|
| Required string | binding:"required" |
| Required string with length cap | binding:"required,max=255" |
| Optional URL | binding:"omitempty,url" |
| Required URL | binding:"required,url" |
| Percentage (0–100) | binding:"min=0,max=100" |
| Non-negative integer/float | binding:"min=0" |
| Enum field | validate in the handler after binding, or add a custom oneof=val1 val2 tag |
Specific gaps to fix when touching these DTOs:
AddConversationMessageInput.Role— addbinding:"required,oneof=user assistant"AddTargetInput.Percentage— addbinding:"min=0,max=100"CreateVirtualKeyInput.MaxRequests/MaxCostUSDandCreateTeamInput.MaxRequests/MaxCostUSD— addbinding:"min=0"SubmitJobRequest.CallbackURL— addbinding:"omitempty,url"- All
Namestring fields — addmax=255(e.g.binding:"required,max=255")
No path param validation exists today. Every handler passes c.Param("id") straight to the service. When adding or editing a handler that reads a path param, validate it explicitly before calling the service:
id := c.Param("id")
if id == "" || len(id) > 100 {
c.JSON(http.StatusBadRequest, gin.H{"error": "invalid id"})
return
}A shared validateParam(c *gin.Context, name string) (string, bool) helper in each module's handler file is the preferred pattern (returns the value and false + writes the 400 if invalid).
- Filter IDs passed as query params (e.g.
routerId,status) must be at most 100 chars; reject or ignore longer values. parseDateRangeOptional()in the observability handler silently returnsnilon a bad date — it should return an explicit 400 instead.- Numeric query params with explicit range constraints (
limit,offset) should clamp or reject out-of-range values consistently. Current pattern (observabilitylimit): values ≤ 0 or > max fall back to the default. Apply this pattern tooffsetas well.
ErrorResponse{Error string, Fields map[string][]string} is the standard error envelope (defined locally in each module's handler file, not shared). Fields is populated only for binding validation errors via validation.BindingErrors from internal/shared/validation.
Each module's handler file has its own respondError(c, err) and respondBindError(c, err, &input) — intentionally duplicated because respondError switches on that module's own domain sentinel errors to map them to the correct HTTP status code.
internal/shared/ contains packages used across modules — do not import one module from another; use these instead.
dbtype— dialect-aware GORM column types for JSON columns. Usedbtype.JSONMap(map[string]any),dbtype.JSONStringMap(map[string]string), ordbtype.JSONStringSlice([]string) on GORM struct fields that need JSON storage. They serialize asjsonbon PostgreSQL andtexton SQLite automatically.template—template.Interpolate(text, vars)replaces{{key}}placeholders;template.ExtractVariables(text)returns the unique set of placeholder names. Used by thepromptsmodule; reuse here rather than reimplementing.validation—validation.BindingErrors(err, input)converts avalidator.ValidationErrorsinto a human-readable summary and amap[string][]stringof field-level messages keyed by JSON field name.pagination—pagination.ParseSlice(c)readspage/perPagequery params;pagination.New(items, total, slice)wraps results inpagination.Paginated[T].audit/webhook— global singletons set once at startup by theobservabilitymodule.audit.Log(...)andwebhook.Send(...)are no-ops until the logger/recorder is registered — they silently do nothing outside a fullNewHTTPApp()/NewLambdaApp()context (e.g. in unit tests).httpserver—httpserver.NewRouter(cfg)creates the shared Gin engine with CORS (origin fromFRONTEND_URL),GET /healthz, Swagger UI (GET /swagger/*), and structuredslogrequest logging (5xx→Error, 4xx→Warn, 2xx→Info, healthz suppressed). Provided byapp.go— all module handlers attach their routes to this engine.logger—logger.Init()configures a coloredsloghandler (viatint) as the global default. Colors are disabled automatically when stderr is not a TTY (CI, Lambda). Called once fromcmd/api/main.goat startup.
GET /metrics returns Prometheus text format. It is registered in internal/app/app.go by registerMetricsEndpoint and aggregates all metrics.Collector implementations. Currently only router pipeline metrics are collected (request counts + average latency per router).
- Create
internal/modules/<name>/with the standard layout above. - Export
func Module() fx.Optioninmodule.go. - Import and add
<name>.Module()ininternal/app/app.go. - Register routes inside your
module.goviafx.Invoke.
Migrations live in internal/db/migrations/ with separate sqlite/ and postgres/ sub-directories. Atlas drives them; atlas.hcl defines local and production environments. All migration files are embedded into the binary at compile time via embed.FS — no external migration files are needed at runtime.
The app auto-applies pending migrations at startup for local dev — make migrate-apply is only needed if you want to apply without starting the server.
After changing any GORM struct that maps to a DB table, run make migrate-diff name=<description> to generate a new versioned SQL file, then run make migrate-hash if you edit it manually.
SQLite note: the DB layer forces WAL journal mode and SetMaxOpenConns(1) — concurrent writers will queue, not error.
Standalone binary that registers GORM models with Atlas so make migrate-diff can generate SQL from struct changes. Not invoked directly — Atlas calls it via the atlas.hcl data source.
Tests operate at the application layer using hand-rolled stub repositories — no test database, no integration harness. Service tests live in package application_test (external); pipeline tests live in package application (internal, so they can reach unexported pipeline helpers).
Each module has an org_security_test.go file that verifies org isolation: stubs return ErrNotFound when the orgID doesn't match the resource owner, and tests confirm the service propagates orgID through every repo call. Add a case here whenever you add a service method that takes orgID.
| Variable | Default | Notes |
|---|---|---|
PORT |
8090 |
Local HTTP port |
DATABASE_DSN |
SQLite file | Override to a PostgreSQL DSN in prod |
JWT_SECRET |
insecure default | Must be set in production |
ADMIN_EMAIL |
— | User with this email always gets admin role |
SQS_QUEUE_URL |
— | Set to use SQS dispatcher instead of goroutines |
FRONTEND_URL |
http://localhost:8080 |
OIDC redirect target |
OIDC_JWKS_URL |
— | Required for OIDC token exchange |
OIDC_PROVIDERS |
— | Comma-separated list (e.g. google,github) |
CACHE_BACKEND |
memory |
memory or redis |
CACHE_REDIS_ADDR |
— | Required when CACHE_BACKEND=redis (e.g. localhost:6379) |
CACHE_REDIS_PREFIX |
— | Optional key namespace for the Redis cache |
RATE_LIMIT_BACKEND |
memory |
memory or redis |
APP_ENV |
development |
Set to production to require JWT_SECRET |
LOG_RETENTION_DAYS |
90 |
Days before inference/audit logs are purged |
HEALTH_CHECK_INTERVAL_SECS |
120 |
Interval between provider health probes |