Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 128 additions & 0 deletions .claude/rules/cli-patterns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
---
description: Eos CLI patterns — cobra commands, flag validation, human-centric input handling
paths:
- "cmd/**/*.go"
- "pkg/interaction/**"
- "pkg/verify/**"
---

# Eos CLI Patterns

## Command Structure

Verb-first with flag-based operations:
```
eos [verb] [noun] --[operation] [target] [--flags...]

eos update hecate --add bionicgpt --dns example.com
eos update vault --fix --dry-run
eos delete env production --force
```

Exception: standard CRUD verbs use positional args:
```
eos update services start nginx # 'start' is a verb, not an operation flag
```

## Human-Centric Flag Handling (P0 — Breaking)

If a required flag is missing, NEVER fail immediately. ALWAYS offer interactive fallback with informed consent.

**Violation**: `if flag == "" { return fmt.Errorf("--token is required") }`

**Correct pattern** — use the full fallback chain:
1. CLI flag (if explicitly set via `cmd.Flags().Changed()`)
2. Environment variable (if configured)
3. Interactive prompt (if TTY available, with help text explaining WHY and HOW)
4. Default value (if `AllowEmpty` is true)
5. Error with clear remediation steps (non-interactive mode only)

```go
// CORRECT: Human-centric with fallback chain
tokenFlag, _ := cmd.Flags().GetString("token")
tokenWasSet := cmd.Flags().Changed("token")

result, err := interaction.GetRequiredString(rc, tokenFlag, tokenWasSet, &interaction.RequiredFlagConfig{
FlagName: "token",
EnvVarName: "VAULT_TOKEN",
PromptMessage: "Enter Vault root token: ",
HelpText: "Required for cluster operations. Get via: vault token create",
IsSecret: true,
})
if err != nil {
return fmt.Errorf("failed to get vault token: %w", err)
}
logger.Info("Using Vault token", zap.String("source", string(result.Source)))
```

Required elements:
- **Help text**: WHY is this needed? HOW to get the value?
- **Source logging**: always log which fallback was used (CLI/env/prompt/default)
- **Validation**: validate input, retry with clear guidance (max 3 attempts)
- **Security**: `IsSecret: true` for passwords/tokens (no terminal echo)

## Missing Dependencies (P0 — Breaking)

NEVER error out immediately when a dependency is missing. ALWAYS offer informed consent to install:
```go
interaction.CheckDependencyWithPrompt(rc, interaction.DependencyConfig{
Name: "docker",
Description: "Container runtime required for service deployment",
InstallCmd: "curl -fsSL https://get.docker.com | sh",
AskConsent: true,
})
```

## Flag Bypass Vulnerability Prevention (P0 — Breaking)

Cobra's `--` separator stops flag parsing and passes everything as positional args. This bypasses safety flags.

**Vulnerable pattern** (user types `eos delete env prod -- --force`):
- Cobra sees args: `["prod", "--force"]` — flags are NOT set
- `--force` check passes silently — production deleted without confirmation

**MANDATORY MITIGATION**: ALL commands accepting positional arguments MUST call `verify.ValidateNoFlagLikeArgs` as the first line of `RunE`:

```go
RunE: eos.Wrap(func(rc *eos_io.RuntimeContext, cmd *cobra.Command, args []string) error {
logger := otelzap.Ctx(rc.Ctx)

// CRITICAL: Detect flag-like args passed as positional (-- bypass)
if err := verify.ValidateNoFlagLikeArgs(args); err != nil {
return err // clear user-facing error with remediation
}

// rest of command logic...
})
```

Affected command types (any using `cobra.ExactArgs`, `cobra.MaximumNArgs`, `cobra.MinimumNArgs`):
- Safety-critical: `cmd/delete/`, `cmd/promote/` — production deletion, approval overrides
- All others: `cmd/backup/`, `cmd/create/`, `cmd/update/`

See `pkg/verify/validators.go:ValidateNoFlagLikeArgs` for implementation.

## Drift Correction Pattern

Services that drift from canonical state (wrong permissions, config values):
```
eos update <service> --fix # detect and correct drift
eos update <service> --fix --dry-run # preview corrections without applying
```

NEVER create separate `eos fix <service>` commands — use `--fix` flag on existing `eos update` commands.

## Configuration Drift Decision

```
Service has drifted?
├─ Use: eos update <service> --fix
├─ Compares: Current state vs. canonical state from eos create
├─ Corrects: Permissions, ownership, config values
└─ Verifies: Post-fix state matches canonical

Want to check only?
└─ Use: eos update <service> --fix --dry-run

DEPRECATED: eos fix vault → use eos update vault --fix
```
96 changes: 96 additions & 0 deletions .claude/rules/debugging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
description: Eos debugging patterns — diagnostic logging, debug commands, evidence collection
paths:
- "cmd/debug/**"
- "pkg/**/*.go"
---

# Debugging Patterns

## Diagnostic Logging Strategy

In `cmd/debug/` handlers, use two distinct output modes:

| Phase | Output method | Purpose |
|---|---|---|
| Diagnostic checks (health, config validation) | `logger.Info/Warn/Error(...)` | Structured — captured by telemetry |
| Progress indicators | `logger.Debug(...)` or `logger.Info(...)` | Visible to user in real-time |
| Issue detection | `logger.Warn/Error(...)` with zap fields | Structured error data |
| **Final report rendering** | `fmt.Print(report.Render())` ONLY | Terminal-formatted output AFTER telemetry |

```go
// CORRECT: cmd/debug handler pattern
func runVaultDiagnostic(rc *eos_io.RuntimeContext) error {
logger := otelzap.Ctx(rc.Ctx)

// Phase 1: diagnostics via structured logger (telemetry captured)
logger.Info("Checking Vault seal status")
sealed, err := vault.CheckSealStatus(rc)
if err != nil {
logger.Error("Failed to check seal status", zap.Error(err))
}
logger.Info("Vault seal status", zap.Bool("sealed", sealed))

// Phase 2: terminal-formatted report ONLY after all diagnostics done
report := buildVaultReport(sealed, ...)
fmt.Print(report.Render()) // OK here — final output only
return nil
}
```

## Evidence Collection

When collecting diagnostic evidence, capture:
1. **State**: current configuration, running services, connectivity
2. **Timestamps**: when check was performed, service start times
3. **Context**: environment variables (redacted secrets), config file hashes
4. **Errors**: full error chains including root cause

```go
// Evidence struct pattern
type DiagnosticEvidence struct {
Timestamp time.Time `json:"timestamp"`
ServiceName string `json:"service_name"`
Checks []CheckResult `json:"checks"`
Errors []string `json:"errors"`
Config map[string]string `json:"config"` // no secret values
}
```

## Debug Command Structure

Debug commands live in `cmd/debug/` and follow this pattern:

```
eos debug [service] # full diagnostic check
eos debug [service] --fix # diagnose and attempt auto-remediation
eos debug [service] --json # machine-readable output for CI/automation
```

Output format:
- Human mode (default): coloured terminal report with summary + details
- JSON mode (`--json`): structured JSON for parsing by other tools

## Automatic Debug Output Capture

For commands that call external tools (`vault`, `consul`, `docker`):
```go
// Capture stdout+stderr for evidence
cmd := exec.CommandContext(rc.Ctx, "vault", "status")
out, err := cmd.CombinedOutput()
if err != nil {
logger.Error("vault status failed",
zap.Error(err),
zap.String("output", string(out)), // attach full output
)
}
```

## Anti-Patterns

| Anti-pattern | Why it's wrong | Do this instead |
|---|---|---|
| `fmt.Println("checking vault...")` in diagnostic phase | Bypasses telemetry, no structured fields | `logger.Info("checking vault status")` |
| `fmt.Print(...)` in pkg/ functions | pkg/ functions have no terminal context | Return structured data, let cmd/ render |
| Swallowing errors in diagnostics | Hidden failures give false-positive health | Log and continue: `logger.Warn("...", zap.Error(err))` |
| `log.Fatal(...)` in pkg/ | Kills process without cleanup | Return error, let cmd/ handle exit |
167 changes: 167 additions & 0 deletions .claude/rules/go-patterns.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
---
description: Eos Go patterns — architecture, constants, logging, idempotency, retry logic
paths:
- "**/*.go"
- "pkg/**/*.go"
---

# Eos Go Patterns

## Architecture: cmd/ vs pkg/ (P0 — Breaking)

**cmd/**: Orchestration ONLY.
- Define `cobra.Command` with flags
- Parse flags into config struct
- Call `pkg/[feature]/Function(rc, config)`
- Return result — NO business logic
- **If cmd/ file exceeds 100 lines → move logic to pkg/**

**pkg/**: ALL business logic.
- Pattern: **ASSESS → INTERVENE → EVALUATE**
1. ASSESS: Check current state
2. INTERVENE: Apply changes if needed
3. EVALUATE: Verify and report results
- Always use `*eos_io.RuntimeContext` for all operations

```go
// Good cmd/ file (thin orchestration)
RunE: eos.Wrap(func(rc *eos_io.RuntimeContext, cmd *cobra.Command, args []string) error {
cfg := &vault.ClusterConfig{Token: tokenFlag}
return vault.UpdateCluster(rc, cfg) // all logic in pkg/
})

// Bad cmd/ file (business logic in cmd/)
RunE: func(cmd *cobra.Command, args []string) error {
client := api.NewClient(...) // WRONG — this belongs in pkg/
resp, err := client.Do(...) // WRONG
return err
}
```

## Logging (P0 — Breaking)

**ALWAYS** use `otelzap.Ctx(rc.Ctx)` — structured logging goes to terminal AND telemetry.

**NEVER** use `fmt.Print*` / `fmt.Println` in pkg/ or cmd/ (except one exception below).

**Exception — cmd/debug/ final report rendering ONLY:**
```go
// CORRECT: diagnostics via logger, final output via fmt
logger.Info("Checking Vault config") // diagnostic — telemetry captured
logger.Warn("Seal status: sealed") // diagnostic
fmt.Print(report.Render()) // ONLY at end, after all telemetry
```

## Constants — Single Source of Truth (P0 — Breaking)

NEVER hardcode literal values. Every value must be a named constant defined in EXACTLY ONE place.

| Value type | Location |
|------------|----------|
| Port numbers | `pkg/shared/ports.go` |
| Common paths | `pkg/shared/paths.go` |
| Vault paths/URLs | `pkg/vault/constants.go` |
| Consul paths | `pkg/consul/constants.go` |
| Service-specific | `pkg/[service]/constants.go` |

**FORBIDDEN hardcoded values:**
```go
// WRONG — hardcoded everywhere
os.MkdirAll("/etc/vault.d", 0755)
net.Listen("tcp", "localhost:8200")
exec.Command("systemctl", "start", "vault.service")

// CORRECT — named constants
os.MkdirAll(vault.VaultConfigDir, vault.VaultDirPerm)
net.Listen("tcp", fmt.Sprintf("%s:%d", shared.LocalhostIP, shared.PortVault))
exec.Command("systemctl", "start", vault.VaultServiceName)
```

**Circular import exception**: Document with `// NOTE: Duplicates B.ConstName to avoid circular import`

**File permissions** must have security rationale in the constant definition:
```go
// VaultTLSKeyPerm restricts private key access to vault user only.
// RATIONALE: Private keys must not be world-readable.
// SECURITY: Prevents credential theft via filesystem access.
// THREAT MODEL: Mitigates insider threat and container escape attacks.
const VaultTLSKeyPerm = 0600
```

## Idempotency (P1)

All pkg/ operations MUST be safe to run multiple times:
- Check before creating: verify state before applying changes
- Use `os.MkdirAll` not `os.Mkdir` (no error if exists)
- Use upsert patterns for config writes
- Compare current state to desired state before modifying

## Retry Logic (P1)

**Transient failures → retry with backoff:**
- Network timeouts, connection refused (service starting)
- Lock contention, resource temporarily unavailable
- HTTP 429/503 (rate limiting, service overloaded)

**Deterministic failures → fail fast, no retry:**
- Config/validation errors, missing required files
- Authentication failures (wrong credentials)
- Permission denied

```go
// Transient: retry
err := retry.Do(func() error {
return vault.CheckHealth(rc)
}, retry.Attempts(5), retry.Delay(2*time.Second))

// Deterministic: fail fast
if cfg.Token == "" {
return fmt.Errorf("vault token required: %w", ErrMissingConfig)
}
```

## Error Context (P1)

Wrap errors with context at EVERY layer:
```go
// WRONG — no context
return err

// CORRECT — context at each layer
return fmt.Errorf("failed to initialize vault cluster: %w", err)
```

User-facing errors use typed error wrappers:
```go
return eos_err.NewUserError("vault token expired — run: vault token renew")
return eos_err.NewSystemError("vault unsealing failed", err)
```

Capture command output in errors:
```go
out, err := cmd.CombinedOutput()
if err != nil {
return fmt.Errorf("command failed: %w\noutput: %s", err, out)
}
```

## Code Integration (P0)

**Before writing new code**, search for existing functionality:
- `grep -r "FunctionName" pkg/` to find existing implementations
- ALWAYS enhance existing functions rather than creating duplicates
- NEVER create a second HTTP client for the same service — add methods to the existing one
- Only deprecate functions if absolutely necessary — prefer evolution over replacement
- Verify integration points: ensure new code is wired into existing callers

## Common Anti-Patterns

| Anti-pattern | Correct approach |
|---|---|
| `fmt.Println("done")` in pkg/ | `logger.Info("operation complete", zap.String("op", "done"))` |
| New HTTP client for existing service | Add method to existing client in `pkg/[service]/client.go` |
| Hardcoded `"/etc/vault.d"` | Use `vault.VaultConfigDir` constant |
| `os.MkdirAll(dir, 0755)` | Use `vault.VaultDirPerm` or `consul.ConsulDirPerm` |
| Business logic in `cmd/*.go` | Move to `pkg/[feature]/*.go` |
| `_ = someFunc()` (discarding errors) | `if err != nil { return fmt.Errorf(...): %w", err) }` |
| Standalone `*.md` docs (except ROADMAP.md, README.md) | Put in inline comments or update ROADMAP.md |
Loading
Loading