Skip to content

refactor(gateway): SOLID — concrete dependency triangle, SessionWriter type-assertion, interface fragmentation, god methods #335

@aaronwong1989

Description

@aaronwong1989

Background

internal/gateway/ 模块包含 16 个源文件(~4,979 行),负责 WebSocket hub 广播、连接管理、AEP 事件分发、Session ↔ Worker 生命周期编排、LLM 重试和 REST API。模块在之前的 cycle 中已分析过性能(issue 325)、DRY(issue 255)、testability(issue 242)。本次为 首次从 SOLID + coupling 角度的系统性分析,发现 6 个独立于已有 issue 的架构问题。

Scope: solid, dry, coupling — cycle 1 (模块分析通过 1)
Key files: hub.go, conn.go, handler.go, bridge.go, bridge_worker.go, api.go


Finding Summary

Category Critical High Medium Low
SOLID/SRP 0 2 0 0
SOLID/OCP 0 1 0 0
SOLID/ISP 0 0 1 0
SOLID/DIP 0 1 0 0
DRY 0 0 1 0
合计 0 4 2 0

Findings

SOLID: SRP

handleinput-god-method

Severity: High | Confidence: High | ROI: High
Location: handler.go:84-237

Problem: handleInput 方法 153 行,混合 5+ 职责:retry cancellation、interaction response routing、help command detection、control command conversion、worker command conversion、session state management (IDLE→RUNNING)、worker input delivery。新增任何消息类型的处理都需要修改此方法。

Current Pattern:

// handler.go:84-237 — 153 lines, 5+ responsibilities
func (h *Handler) handleInput(ctx context.Context, conn *Conn, env *events.Envelope) {
    // Retry cancellation (lines 86-95)
    // Data extraction + interaction response (lines 97-120)
    // Help command detection (lines 122-135)
    // Control command conversion (lines 137-165)
    // Worker command conversion (lines 167-195)
    // Session state check + IDLE→RUNNING (lines 197-215)
    // Worker delivery + event capture (lines 217-237)
}

Proposed Fix: 分解为职责单一的子 handler:

func (h *Handler) handleInput(ctx context.Context, conn *Conn, env *events.Envelope) {
    h.cancelRetryIfNeeded(conn.SessionID(), env)

    data, err := h.extractInputData(env)
    if err != nil { ... }

    if handled := h.tryInteractionResponse(ctx, conn, data); handled { return }
    if handled := h.tryCommandDispatch(ctx, conn, env, data); handled { return }

    h.deliverToWorker(ctx, conn, env, data)
}

Estimated Impact: 减少 ~80 行 handleInput 代码,每个子 handler 可独立测试

Acceptance Criteria:

  • handleInput ≤ 40 行,只做 dispatch 不含业务逻辑
  • tryCommandDispatch 处理 help/control/worker command 检测
  • 现有测试零回归,新增 table-driven test 覆盖各 dispatch 路径

handleworkercommand-god-method

Severity: High | Confidence: High | ROI: Medium
Location: handler.go:592-704

Problem: handleWorkerCommand 方法 112 行,混合 data extraction (args/extra map parsing)、ownership validation、session state check、command dispatch via long switch chain。每个新 worker command 都需要修改此方法。

Current Pattern:

// handler.go:592-704 — 112 lines
func (h *Handler) handleWorkerCommand(ctx context.Context, conn *Conn, env *events.Envelope) {
    // Data extraction from map (lines 597-620)
    // Ownership validation (lines 622-635)
    // Long switch chain on command type (lines 640-700)
    // Each case: extract args, validate, call bridge method
}

Proposed Fix: 使用 command handler registry 替代 switch chain:

type workerCmdHandler func(ctx context.Context, conn *Conn, args string, extra map[string]any) error

var workerCmdHandlers = map[string]workerCmdHandler{
    events.Compact:  (*Handler).handleCompact,
    events.Clear:    (*Handler).handleClear,
    events.Rewind:   (*Handler).handleRewind,
    // ... new commands just add a map entry
}

Estimated Impact: 新增 worker command 从修改 112 行方法变为添加 1 个 map entry + 1 个 handler function

Acceptance Criteria:

  • handleWorkerCommand ≤ 30 行 dispatch + error handling
  • 每个 command 类型有独立 handler function
  • 现有 command_detection_test.go 零回归

SOLID: OCP

hub-type-asserts-sessionwriter

Severity: High | Confidence: High | ROI: High
Location: hub.go:473 (routeMessage), hub.go:581 (Shutdown)

Problem: Hub.routeMessageHub.ShutdownSessionWriter 接口做 type assertion 到 *Conn*pcEntry,破坏了 SessionWriter 的多态性。新增任何 SessionWriter 实现(如测试 mock、新的 platform adapter)都需要修改这两个方法。

Current Pattern:

// hub.go:473 — routeMessage type-asserts to *Conn
for _, conn := range conns {
    if c, ok := conn.(*Conn); ok {        // breaks polymorphism
        if encoded == nil {
            encoded, err = aep.EncodeJSON(msg.Env)
        }
        if err := c.WriteMessage(websocket.TextMessage, encoded); err != nil {
            _ = conn.Close()
        }
    }
}

// hub.go:581 — Shutdown type-asserts to *pcEntry
for _, entry := range b.platformConns {
    if pc, ok := entry.(*pcEntry); ok {   // breaks polymorphism
        pc.Close()
    }
}

Proposed Fix: 将 encode-once + WriteMessage 逻辑移入 Conn.WriteCtx,delta coalescing 逻辑留在 pcEntry.WriteCtx,Hub 只调用接口方法:

// Hub 只调用接口,不做 type assertion
for _, sw := range writers {
    if err := sw.WriteCtx(ctx, msg.Env); err != nil {
        _ = sw.Close()
        // ... remove from set
    }
}

Estimated Impact: Hub 从依赖 2 个具体类型变为只依赖 SessionWriter 接口

Acceptance Criteria:

  • hub.gorouteMessageShutdown 无 type assertion
  • Conn.WriteCtx 处理 AEP encode + WriteMessage
  • pcEntry.WriteCtx 保持现有 coalescing 逻辑
  • hub_test.go 可使用 mock SessionWriter 而非真实 Conn

SOLID: ISP

sessionmanager-interface-fragmentation

Severity: Medium | Confidence: High | ROI: Medium
Location: handler.go:799-846 (SessionReader + 4 sub-interfaces), bridge.go:30 (bridgeSM), api.go:24 (apiSM), conn.go:31 (SessionStarter)

Problem: SessionManager 的概念被碎片化为 4 个独立的接口定义,每个消费者自己定义 ad-hoc 子集。handler.go 定义了 5 个精细的 sub-interface(SessionReader/Lifecycle/Transitioner/WorkerManager/Admin)+ 组合的 SessionManagerbridge.go 另外定义了 bridgeSM(12 方法,与 handler.go 的 sub-interface 有大量重叠但方法集不同);api.go 定义了 apiSM(4 方法)。

这导致:

  • 同一方法在 3 个 interface 中重复声明(如 Get 在 SessionReader、bridgeSM、apiSM 中都有)
  • bridgeSM 无法组合自 handler.go 的 sub-interfaces(方法签名细微差异)
  • 新增 Session 方法需要同步更新多个 interface

Current Pattern:

// handler.go:799 — 5 sub-interfaces + composite
type SessionReader interface { Get(ctx, id) (*SessionInfo, error) }
type SessionLifecycle interface { Create(...) (string, error); Delete(...) error }
type SessionTransitioner interface { Transition(...) error; ... }
type SessionWorkerManager interface { AttachWorker(...); DetachWorker(...); ... }
type SessionAdmin interface { ForceDelete(...); ... }
type SessionManager interface {
    SessionReader; SessionLifecycle; SessionTransitioner
    SessionWorkerManager; SessionAdmin
}

// bridge.go:30 — ad-hoc subset, overlapping but different
type bridgeSM interface {
    Get(context.Context, string) (*session.SessionInfo, error)
    Create(...) (string, error)
    Transition(...)
    // ... 12 methods, some overlap with handler.go sub-interfaces
}

// api.go:24 — another ad-hoc subset
type apiSM interface {
    Get(context.Context, string) (*session.SessionInfo, error)
    ListByUser(...) ([]session.SessionInfo, error)
    Delete(...) error
    TransitionByID(...)
}

Proposed Fix: 在独立文件(如 session_interfaces.gointerfaces.go)中定义 canonical sub-interfaces,bridge/api/conn 通过组合引用:

// interfaces.go — canonical definitions
type SessionReader interface { Get(ctx, id) (*SessionInfo, error) }
type SessionWriter interface { Create(...); Delete(...); Transition(...) }

// bridge.go — compose from canonical
type bridgeSM interface {
    SessionReader; SessionWriter
    AttachWorker(...); DetachWorker(...)
}

Estimated Impact: 消除 3 处重复的 Get 声明,新增 Session 方法只需更新 canonical interface

Acceptance Criteria:

  • Canonical sub-interfaces 定义在独立文件
  • bridgeSMapiSM 通过 embedding 组合而非重新声明方法
  • 所有现有测试零回归

SOLID: DIP

hub-conn-handler-concrete-triangle

Severity: High | Confidence: High | ROI: Medium
Location: hub.go:338 (HandleHTTP), conn.go:100 (ReadPump), handler.go:27 (Handler struct)

Problem: Hub、Conn、Handler 形成具体类型依赖三角:

  • Hub.HandleHTTP 接收 *Handler*Bridge 指针
  • Conn.ReadPump 接收 *Handler 并直接访问 handler.sm, handler.auth, handler.jwtValidator
  • Handler 持有 *Hub 并调用 hub.SendToSession, hub.NextSeq

这导致:

  • 三者无法独立测试,必须构造完整依赖链
  • Handler 内部字段(sm, auth, jwtValidator)被 Conn 直接访问,破坏封装

Current Pattern:

// hub.go:338 — HandleHTTP takes concrete *Handler
func (h *Hub) HandleHTTP(handler *Handler, bridge BridgeSessionStarter, ...) {
    // ...
    conn := NewConn(wc, h, ...)
    go conn.ReadPump(ctx, handler)  // passes concrete *Handler
}

// conn.go:100 — ReadPump accesses Handler internals
func (c *Conn) ReadPump(ctx context.Context, handler *Handler) {
    // ...
    h.auth.AuthenticateRequest(r)    // directly accesses handler.auth
    handler.jwtValidator.Validate(...)  // directly accesses handler.jwtValidator
    handler.sm.Get(ctx, sessionID)   // directly accesses handler.sm
}

// handler.go:27 — Handler holds concrete *Hub
type Handler struct {
    hub *Hub
    // ...
}

Proposed Fix: 为 Conn 需要的 Handler 能力定义窄接口:

// Conn only needs these capabilities
type ConnHandler interface {
    Authenticate(r *http.Request) (*security.AuthResult, error)
    ValidateJWT(token string) (*security.JWTClaims, error)
    GetSession(ctx context.Context, id string) (*session.SessionInfo, error)
}

func (c *Conn) ReadPump(ctx context.Context, handler ConnHandler) { ... }

Estimated Impact: Conn 可独立测试(mock ConnHandler),减少测试中的 concrete 依赖

Acceptance Criteria:

  • Conn.ReadPump 参数从 *Handler 改为窄接口
  • Hub.HandleHTTP 不再接收 *Handler
  • conn_test.go 可用 mock 接口替代真实 Handler
  • 所有现有测试零回归

DRY

hub-copy-under-lock-iterate-write-pattern

Severity: Medium | Confidence: High | ROI: Medium
Location: hub.go:442-496 (routeMessage), hub.go:312-334 (sendControlToSession), hub.go:570-604 (Shutdown)

Problem: "RLock → copy session conns → iterate → write → handle error" 模式在 hub.go 中重复 3 次,每次有细微变体(type assertion、error handling、remove-on-error)。

Current Pattern:

// Pattern repeated in routeMessage, sendControlToSession, Shutdown:
h.mu.RLock()
conns := make(map[SessionWriter]bool)
for k, v := range h.sessions[sessionID] { conns[k] = v }
h.mu.RUnlock()

for conn := range conns {
    // write + error handling (different per call site)
    if err := conn.WriteCtx(...); err != nil {
        _ = conn.Close()
        // remove from session set
    }
}

Proposed Fix: 提取通用方法:

func (h *Hub) writeToMany(ctx context.Context, sessionID string, env *events.Envelope, conns map[SessionWriter]bool) {
    for conn := range conns {
        if err := conn.WriteCtx(ctx, env); err != nil {
            _ = conn.Close()
            h.removeConnFromSession(conn, sessionID)
        }
    }
}

Estimated Impact: ~40 行减少,统一 error handling 策略

Acceptance Criteria:

  • routeMessage、sendControlToSession、Shutdown 共享通用 write-to-many 方法
  • error handling 策略(close-on-error, remove-from-set)统一
  • hub_test.go 零回归

Implementation Priority

Finding Priority Effort Risk Impact
hub-type-asserts-sessionwriter P1 Small Low 解锁 mock 测试,Hub 不再依赖具体类型
hub-conn-handler-concrete-triangle P1 Medium Medium Conn 可独立测试,减少耦合
handleinput-god-method P2 Medium Low ~80 行减少,独立测试 dispatch 路径
sessionmanager-interface-fragmentation P2 Medium Low 消除重复声明,新增方法只改一处
hub-copy-under-lock-iterate-write-pattern P3 Small Low ~40 行减少,统一 error handling
handleworkercommand-god-method P3 Medium Low 新 command 从改 112 行变为加 map entry

Recommended starting point: hub-type-asserts-sessionwriter — 小改动(消除 2 处 type assertion),高回报(Hub 完全依赖接口)


Out of Scope

  • conn.go:performInit (273 行 god method) — 已在 issue 255 跟踪
  • bridge_forward.go:forwardEvents (222 行 god method) — 已在 issue 242 跟踪
  • Slack context injection 3x duplication — 已在 issue 242 跟踪
  • api.go auth+ownership check boilerplate — 部分在 issue 255 覆盖
  • llm_retry.go pattern compilation duplication — Low ROI

Verification

  • make test 通过,无回归
  • make lint 不产生新警告
  • go test -race ./internal/gateway/... 通过

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Medium: tech debt, refactoring, improvementsarchitectureDomain: design patterns, coupling, separation of concernsarea/gatewayScope: WebSocket hub, conn, handler, bridge, APIrefactorRefactor: DRY, SOLID, code quality improvements

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions