Skip to content

refactor(gateway): SRP — performInit/forwardEvents god methods, buildWorkerInfo+env injection trio #374

@hrygo

Description

@hrygo

Background

internal/gateway/ 模块包含 20 个源文件(~6,200 行),负责 WebSocket hub 广播、连接管理、AEP 事件分发、Session ↔ Worker 生命周期编排。Issue 335 已跟踪 handleInput/handleWorkerCommand god methods 和 Hub type-assertion 问题。本次为 gateway 模块首次从 SOLID (SRP) + DRY + coupling 角度的增量分析,发现 3 个独立于 issue 335 的架构问题。

Scope: solid, dry, coupling — cycle 49 (模块分析通过 2)
Key files: conn.go, bridge_forward.go, bridge.go, bridge_worker.go


Finding Summary

Category Critical High Medium Low
SOLID/SRP 0 2 0 0
DRY 0 0 1 0
合计 0 2 1 0

Findings

SOLID: SRP

performinit-god-method

Severity: High | Confidence: High | ROI: High
Location: conn.go:220-488 (~270 lines)

Problem: performInit 是 gateway 模块中最长的方法(~270 行),混合 7+ 职责:init 消息解码、deferred auth + JWT 验证、WorkDir 解析与安全校验、Session ID 派生(pre-resolved / UUIDv5)、throttle 检查、init-phase buffering 管理、7 种 session 状态分支处理(not-found / CREATED / DELETED / worker-alive / IDLE / TERMINATED / RUNNING)、SEC-008 用户归属校验、SEC-007 bot 隔离校验、init_ack 发送。任何新的认证方式或 session 状态都需要修改此方法。

Current Pattern:

// conn.go:220-488 — 270 lines, 7+ responsibilities
func (c *Conn) performInit(handler *Handler) error {
    // Read + decode init envelope (lines 229-248)
    // Deferred auth / JWT validation (lines 260-298)
    // WorkDir resolution + validation (lines 300-311)
    // Session ID derivation: pre-resolved or UUIDv5 (lines 317-327)
    // Throttle check (lines 329-334)
    // Init buffering enable + session join (lines 336-343)
    // 7 state branches for session resolution (lines 345-444):
    //   not-found → create, CREATED → start, DELETED → delete+create,
    //   worker-alive → fast reconnect, IDLE/TERMINATED/RUNNING → resume+fallback
    // SEC-008 user ownership check (lines 446-454)
    // SEC-007 bot isolation check (lines 456-462)
    // Init ack send + log (lines 464-488)
}

Proposed Fix: 分解为职责单一的阶段方法:

func (c *Conn) performInit(handler *Handler) error {
    initData, err := c.readAndValidateInit(handler)
    if err != nil { return err }

    sessionID, si, err := c.resolveSession(handler, initData)
    if err != nil { return err }

    if err := c.authorizeSession(handler, si); err != nil { return err }
    return c.completeInit(handler, sessionID, si, initData)
}

Estimated Impact: performInit 从 ~270 行降至 ~20 行 dispatch,每个子方法 ~40-60 行可独立测试

Acceptance Criteria:

  • performInit ≤ 30 行,只做 4 阶段 dispatch 不含分支逻辑
  • readAndValidateInit 处理消息解码 + 字段校验
  • resolveSession 处理 Session ID 派生 + 7 种状态分支
  • authorizeSession 处理 SEC-008/SEC-007 归属校验
  • 现有 conn_test.go 零回归

forwardevents-god-function

Severity: High | Confidence: High | ROI: Medium
Location: bridge_forward.go:25-256 (~230 lines)

Problem: forwardEvents 是 gateway 模块第二长的方法(~230 行),交织 10+ 关注点:worker 事件接收循环、session 信息缓存、reset generation 防竞争、turn 文本累积(LLM retry 用)、turn timer 超时管理、per-event Clone + SessionID 注入、stats 累积(tool calls / turn stats / context usage)、dropped delta 调解(backpressure reconciliation)、pending error 缓冲(retry 决策前抑制错误事件)、LLM auto-retry 判断与触发、事件捕获(eventstore inbound/outbound)、worker 退出处理(handleWorkerExit 调用)。这些关注点的代码在 for env := range recvCh 循环中纵向交织,每个关注点跨越 10-30 行散布在循环的不同位置。

Current Pattern:

// bridge_forward.go:25-256 — 230 lines, 10+ interleaved concerns
func (b *Bridge) forwardEvents(w worker.Worker, sessionID string, opts forwardOpts) {
    // Setup: session cache, startTime, turnStartTime, firstEvent, doneReceived (lines 31-53)
    // LLM retry state: turnText, lastError, pendingError (lines 56-58)
    // Turn timer setup (lines 61-76)

    for env := range recvCh {
        // Error buffering for LLM retry (lines 82-95)
        // Worker session ID persistence (lines 97-101)
        // Turn timer reset (lines 103-108)
        // Clone + session injection (lines 110-112)
        // Delta content extraction (lines 114-122)
        // Stats accumulation: tool_call / done (lines 125-161)
        // Dropped delta reconciliation on done (lines 164-184)
        // Hub send (lines 186-188)
        // Event capture: delta vs non-delta (lines 190-194)
        // Pending error flush on non-Done (lines 197-203)
        // LLM retry decision on Done (lines 206-226)
        // Turn reset on Done (lines 228-231)
    }
    // Post-loop: pending error flush (lines 234-241)
    // Worker exit handling (lines 243-256)
}

Proposed Fix: 将循环体拆分为 per-event processor 方法,每个关注点一个方法:

func (b *Bridge) forwardEvents(w worker.Worker, sessionID string, opts forwardOpts) {
    ctx := newForwardContext(w, sessionID, opts)
    defer ctx.cleanup()

    for env := range w.Conn().Recv() {
        if ctx.shouldSkip(env) { continue }
        ctx.onEvent(env)

        processed := b.processEvent(ctx, env)
        b.forwardToHub(ctx, processed)

        if ctx.isDone(env) {
            b.handleTurnEnd(ctx, env)
        }
    }
    b.handleWorkerExit(w, ctx.exitParams())
}

Estimated Impact: forwardEvents 从 ~230 行降至 ~20 行循环体,每个 processor 方法 15-30 行。但需要引入 forwardContext struct 来传递跨迭代状态,增加 ~50 行新类型定义。净收益:可读性显著提升,但总代码量变化不大。

Acceptance Criteria:

  • forwardEvents 主循环 ≤ 30 行
  • LLM retry 逻辑提取到独立方法(onError / onDone retry decision)
  • Stats 累积提取到独立方法(onToolCall / onDone stats)
  • Turn timer 管理封装在 forwardContext 内
  • 现有 bridge_test.go + bridge_forward_truncate_test.go 零回归

DRY

buildworkerinfo-env-injection-trio

Severity: Medium | Confidence: High | ROI: High
Location: bridge.go:130-132, bridge.go:209-211, bridge_worker.go:151-153

Problem: buildWorkerInfo + injectSlackEnv + injectGatewayContext 三联调用在 3 个位置完全重复。每次新增环境变量注入逻辑(如最近添加的 cronEnv 注入)都需要在所有 3 个位置同步修改,漏改任何一个会导致特定 session 路径缺少环境变量。

Current Pattern:

// bridge.go:130-132 (StartSession)
workerInfo := b.buildWorkerInfo(id, userID, workDir, si)
injectSlackEnv(&workerInfo, platformKey)
workerInfo.Env = injectGatewayContext(workerInfo.Env, platform, botID, userID, platformKey, id, workDir)

// bridge.go:209-211 (ResumeSession) — identical pattern with si fields
workerInfo := b.buildWorkerInfo(si.ID, si.UserID, workDir, si)
injectSlackEnv(&workerInfo, si.PlatformKey)
workerInfo.Env = injectGatewayContext(workerInfo.Env, si.Platform, si.BotID, si.UserID, si.PlatformKey, id, workDir)

// bridge_worker.go:151-153 (attemptResumeFallback) — identical pattern with si fields
workerInfo := b.buildWorkerInfo(si.ID, si.UserID, p.workDir, si)
injectSlackEnv(&workerInfo, si.PlatformKey)
workerInfo.Env = injectGatewayContext(workerInfo.Env, si.Platform, si.BotID, si.UserID, si.PlatformKey, si.ID, p.workDir)

Proposed Fix: 将三联调用封装为 Bridge 方法:

func (b *Bridge) prepareWorkerInfo(sessionID, userID, workDir string, si *session.SessionInfo) worker.SessionInfo {
    info := b.buildWorkerInfo(sessionID, userID, workDir, si)
    injectSlackEnv(&info, si.PlatformKey)
    info.Env = injectGatewayContext(info.Env, si.Platform, si.BotID, si.UserID, si.PlatformKey, sessionID, workDir)
    return info
}

Estimated Impact: 3 个调用点各减少 2 行(净减少 ~4 行),但更重要的是:新增 env 注入只需修改 1 处

Acceptance Criteria:

  • 新增 prepareWorkerInfo 方法合并 buildWorkerInfo + injectSlackEnv + injectGatewayContext
  • StartSession、ResumeSession、attemptResumeFallback 统一调用 prepareWorkerInfo
  • make test 通过,bridge_test.go 零回归

Implementation Priority

Finding Priority Effort Risk Impact
buildworkerinfo-env-injection-trio P0 Small Low 3 调用点去重,防止漏改 bug
performinit-god-method P1 Medium Medium 270→20 行,可测试性大幅提升
forwardevents-god-function P2 Large Medium 230→30 行循环,需引入 forwardContext

Recommended starting point: 先解决 P0 buildworkerinfo trio(小投入、即时收益),再处理 P1 performInit(可测试性改善最大)。


Out of Scope

以下区域有意不更改:

  • Hub.routeMessage type-assertion: 已在 issue 335 跟踪
  • handleInput / handleWorkerCommand god methods: 已在 issue 335 跟踪
  • Bridge → Hub concrete dependency: 已在 issue 335 跟踪(concrete dependency triangle)
  • Hub → config.ConfigStore coupling: ConfigStore 是稳定基础设施依赖,抽象 ROI 低
  • Handler → Bridge concrete dependency: Handler 需要的 Bridge 能力(CancelRetry、CaptureInbound、ResetSession)不足以独立为接口

Verification

  • make test 通过,无回归
  • make lint 不产生新警告
  • conn_test.go 现有 init handshake 测试全部通过
  • bridge_test.go 现有 StartSession/ResumeSession 测试全部通过

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2High: affects many users, daily occurrencesarchitectureDomain: design patterns, coupling, separation of concernsarea/gatewayScope: WebSocket hub, conn, handler, bridge, APIrefactorRefactor: DRY, SOLID, code quality improvements

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions