Background
internal/gateway/ 模块包含 20 个源文件(~6,200 行),负责 WebSocket hub 广播、连接管理、AEP 事件分发、Session ↔ Worker 生命周期编排。Issue 335 已跟踪 handleInput/handleWorkerCommand god methods 和 Hub type-assertion 问题。本次为 gateway 模块首次从 SOLID (SRP) + DRY + coupling 角度的增量分析,发现 3 个独立于 issue 335 的架构问题。
Scope: solid, dry, coupling — cycle 49 (模块分析通过 2)
Key files: conn.go, bridge_forward.go, bridge.go, bridge_worker.go
Finding Summary
| Category |
Critical |
High |
Medium |
Low |
| SOLID/SRP |
0 |
2 |
0 |
0 |
| DRY |
0 |
0 |
1 |
0 |
| 合计 |
0 |
2 |
1 |
0 |
Findings
SOLID: SRP
performinit-god-method
Severity: High | Confidence: High | ROI: High
Location: conn.go:220-488 (~270 lines)
Problem: performInit 是 gateway 模块中最长的方法(~270 行),混合 7+ 职责:init 消息解码、deferred auth + JWT 验证、WorkDir 解析与安全校验、Session ID 派生(pre-resolved / UUIDv5)、throttle 检查、init-phase buffering 管理、7 种 session 状态分支处理(not-found / CREATED / DELETED / worker-alive / IDLE / TERMINATED / RUNNING)、SEC-008 用户归属校验、SEC-007 bot 隔离校验、init_ack 发送。任何新的认证方式或 session 状态都需要修改此方法。
Current Pattern:
// conn.go:220-488 — 270 lines, 7+ responsibilities
func (c *Conn) performInit(handler *Handler) error {
// Read + decode init envelope (lines 229-248)
// Deferred auth / JWT validation (lines 260-298)
// WorkDir resolution + validation (lines 300-311)
// Session ID derivation: pre-resolved or UUIDv5 (lines 317-327)
// Throttle check (lines 329-334)
// Init buffering enable + session join (lines 336-343)
// 7 state branches for session resolution (lines 345-444):
// not-found → create, CREATED → start, DELETED → delete+create,
// worker-alive → fast reconnect, IDLE/TERMINATED/RUNNING → resume+fallback
// SEC-008 user ownership check (lines 446-454)
// SEC-007 bot isolation check (lines 456-462)
// Init ack send + log (lines 464-488)
}
Proposed Fix: 分解为职责单一的阶段方法:
func (c *Conn) performInit(handler *Handler) error {
initData, err := c.readAndValidateInit(handler)
if err != nil { return err }
sessionID, si, err := c.resolveSession(handler, initData)
if err != nil { return err }
if err := c.authorizeSession(handler, si); err != nil { return err }
return c.completeInit(handler, sessionID, si, initData)
}
Estimated Impact: performInit 从 ~270 行降至 ~20 行 dispatch,每个子方法 ~40-60 行可独立测试
Acceptance Criteria:
forwardevents-god-function
Severity: High | Confidence: High | ROI: Medium
Location: bridge_forward.go:25-256 (~230 lines)
Problem: forwardEvents 是 gateway 模块第二长的方法(~230 行),交织 10+ 关注点:worker 事件接收循环、session 信息缓存、reset generation 防竞争、turn 文本累积(LLM retry 用)、turn timer 超时管理、per-event Clone + SessionID 注入、stats 累积(tool calls / turn stats / context usage)、dropped delta 调解(backpressure reconciliation)、pending error 缓冲(retry 决策前抑制错误事件)、LLM auto-retry 判断与触发、事件捕获(eventstore inbound/outbound)、worker 退出处理(handleWorkerExit 调用)。这些关注点的代码在 for env := range recvCh 循环中纵向交织,每个关注点跨越 10-30 行散布在循环的不同位置。
Current Pattern:
// bridge_forward.go:25-256 — 230 lines, 10+ interleaved concerns
func (b *Bridge) forwardEvents(w worker.Worker, sessionID string, opts forwardOpts) {
// Setup: session cache, startTime, turnStartTime, firstEvent, doneReceived (lines 31-53)
// LLM retry state: turnText, lastError, pendingError (lines 56-58)
// Turn timer setup (lines 61-76)
for env := range recvCh {
// Error buffering for LLM retry (lines 82-95)
// Worker session ID persistence (lines 97-101)
// Turn timer reset (lines 103-108)
// Clone + session injection (lines 110-112)
// Delta content extraction (lines 114-122)
// Stats accumulation: tool_call / done (lines 125-161)
// Dropped delta reconciliation on done (lines 164-184)
// Hub send (lines 186-188)
// Event capture: delta vs non-delta (lines 190-194)
// Pending error flush on non-Done (lines 197-203)
// LLM retry decision on Done (lines 206-226)
// Turn reset on Done (lines 228-231)
}
// Post-loop: pending error flush (lines 234-241)
// Worker exit handling (lines 243-256)
}
Proposed Fix: 将循环体拆分为 per-event processor 方法,每个关注点一个方法:
func (b *Bridge) forwardEvents(w worker.Worker, sessionID string, opts forwardOpts) {
ctx := newForwardContext(w, sessionID, opts)
defer ctx.cleanup()
for env := range w.Conn().Recv() {
if ctx.shouldSkip(env) { continue }
ctx.onEvent(env)
processed := b.processEvent(ctx, env)
b.forwardToHub(ctx, processed)
if ctx.isDone(env) {
b.handleTurnEnd(ctx, env)
}
}
b.handleWorkerExit(w, ctx.exitParams())
}
Estimated Impact: forwardEvents 从 ~230 行降至 ~20 行循环体,每个 processor 方法 15-30 行。但需要引入 forwardContext struct 来传递跨迭代状态,增加 ~50 行新类型定义。净收益:可读性显著提升,但总代码量变化不大。
Acceptance Criteria:
DRY
buildworkerinfo-env-injection-trio
Severity: Medium | Confidence: High | ROI: High
Location: bridge.go:130-132, bridge.go:209-211, bridge_worker.go:151-153
Problem: buildWorkerInfo + injectSlackEnv + injectGatewayContext 三联调用在 3 个位置完全重复。每次新增环境变量注入逻辑(如最近添加的 cronEnv 注入)都需要在所有 3 个位置同步修改,漏改任何一个会导致特定 session 路径缺少环境变量。
Current Pattern:
// bridge.go:130-132 (StartSession)
workerInfo := b.buildWorkerInfo(id, userID, workDir, si)
injectSlackEnv(&workerInfo, platformKey)
workerInfo.Env = injectGatewayContext(workerInfo.Env, platform, botID, userID, platformKey, id, workDir)
// bridge.go:209-211 (ResumeSession) — identical pattern with si fields
workerInfo := b.buildWorkerInfo(si.ID, si.UserID, workDir, si)
injectSlackEnv(&workerInfo, si.PlatformKey)
workerInfo.Env = injectGatewayContext(workerInfo.Env, si.Platform, si.BotID, si.UserID, si.PlatformKey, id, workDir)
// bridge_worker.go:151-153 (attemptResumeFallback) — identical pattern with si fields
workerInfo := b.buildWorkerInfo(si.ID, si.UserID, p.workDir, si)
injectSlackEnv(&workerInfo, si.PlatformKey)
workerInfo.Env = injectGatewayContext(workerInfo.Env, si.Platform, si.BotID, si.UserID, si.PlatformKey, si.ID, p.workDir)
Proposed Fix: 将三联调用封装为 Bridge 方法:
func (b *Bridge) prepareWorkerInfo(sessionID, userID, workDir string, si *session.SessionInfo) worker.SessionInfo {
info := b.buildWorkerInfo(sessionID, userID, workDir, si)
injectSlackEnv(&info, si.PlatformKey)
info.Env = injectGatewayContext(info.Env, si.Platform, si.BotID, si.UserID, si.PlatformKey, sessionID, workDir)
return info
}
Estimated Impact: 3 个调用点各减少 2 行(净减少 ~4 行),但更重要的是:新增 env 注入只需修改 1 处
Acceptance Criteria:
Implementation Priority
| Finding |
Priority |
Effort |
Risk |
Impact |
| buildworkerinfo-env-injection-trio |
P0 |
Small |
Low |
3 调用点去重,防止漏改 bug |
| performinit-god-method |
P1 |
Medium |
Medium |
270→20 行,可测试性大幅提升 |
| forwardevents-god-function |
P2 |
Large |
Medium |
230→30 行循环,需引入 forwardContext |
Recommended starting point: 先解决 P0 buildworkerinfo trio(小投入、即时收益),再处理 P1 performInit(可测试性改善最大)。
Out of Scope
以下区域有意不更改:
- Hub.routeMessage type-assertion: 已在 issue 335 跟踪
- handleInput / handleWorkerCommand god methods: 已在 issue 335 跟踪
- Bridge → Hub concrete dependency: 已在 issue 335 跟踪(concrete dependency triangle)
- Hub → config.ConfigStore coupling: ConfigStore 是稳定基础设施依赖,抽象 ROI 低
- Handler → Bridge concrete dependency: Handler 需要的 Bridge 能力(CancelRetry、CaptureInbound、ResetSession)不足以独立为接口
Verification
Background
internal/gateway/模块包含 20 个源文件(~6,200 行),负责 WebSocket hub 广播、连接管理、AEP 事件分发、Session ↔ Worker 生命周期编排。Issue 335 已跟踪 handleInput/handleWorkerCommand god methods 和 Hub type-assertion 问题。本次为 gateway 模块首次从 SOLID (SRP) + DRY + coupling 角度的增量分析,发现 3 个独立于 issue 335 的架构问题。Scope: solid, dry, coupling — cycle 49 (模块分析通过 2)
Key files:
conn.go,bridge_forward.go,bridge.go,bridge_worker.goFinding Summary
Findings
SOLID: SRP
performinit-god-method
Severity: High | Confidence: High | ROI: High
Location:
conn.go:220-488(~270 lines)Problem:
performInit是 gateway 模块中最长的方法(~270 行),混合 7+ 职责:init 消息解码、deferred auth + JWT 验证、WorkDir 解析与安全校验、Session ID 派生(pre-resolved / UUIDv5)、throttle 检查、init-phase buffering 管理、7 种 session 状态分支处理(not-found / CREATED / DELETED / worker-alive / IDLE / TERMINATED / RUNNING)、SEC-008 用户归属校验、SEC-007 bot 隔离校验、init_ack 发送。任何新的认证方式或 session 状态都需要修改此方法。Current Pattern:
Proposed Fix: 分解为职责单一的阶段方法:
Estimated Impact: performInit 从 ~270 行降至 ~20 行 dispatch,每个子方法 ~40-60 行可独立测试
Acceptance Criteria:
performInit≤ 30 行,只做 4 阶段 dispatch 不含分支逻辑readAndValidateInit处理消息解码 + 字段校验resolveSession处理 Session ID 派生 + 7 种状态分支authorizeSession处理 SEC-008/SEC-007 归属校验forwardevents-god-function
Severity: High | Confidence: High | ROI: Medium
Location:
bridge_forward.go:25-256(~230 lines)Problem:
forwardEvents是 gateway 模块第二长的方法(~230 行),交织 10+ 关注点:worker 事件接收循环、session 信息缓存、reset generation 防竞争、turn 文本累积(LLM retry 用)、turn timer 超时管理、per-event Clone + SessionID 注入、stats 累积(tool calls / turn stats / context usage)、dropped delta 调解(backpressure reconciliation)、pending error 缓冲(retry 决策前抑制错误事件)、LLM auto-retry 判断与触发、事件捕获(eventstore inbound/outbound)、worker 退出处理(handleWorkerExit 调用)。这些关注点的代码在for env := range recvCh循环中纵向交织,每个关注点跨越 10-30 行散布在循环的不同位置。Current Pattern:
Proposed Fix: 将循环体拆分为 per-event processor 方法,每个关注点一个方法:
Estimated Impact: forwardEvents 从 ~230 行降至 ~20 行循环体,每个 processor 方法 15-30 行。但需要引入 forwardContext struct 来传递跨迭代状态,增加 ~50 行新类型定义。净收益:可读性显著提升,但总代码量变化不大。
Acceptance Criteria:
forwardEvents主循环 ≤ 30 行DRY
buildworkerinfo-env-injection-trio
Severity: Medium | Confidence: High | ROI: High
Location:
bridge.go:130-132,bridge.go:209-211,bridge_worker.go:151-153Problem:
buildWorkerInfo+injectSlackEnv+injectGatewayContext三联调用在 3 个位置完全重复。每次新增环境变量注入逻辑(如最近添加的 cronEnv 注入)都需要在所有 3 个位置同步修改,漏改任何一个会导致特定 session 路径缺少环境变量。Current Pattern:
Proposed Fix: 将三联调用封装为 Bridge 方法:
Estimated Impact: 3 个调用点各减少 2 行(净减少 ~4 行),但更重要的是:新增 env 注入只需修改 1 处
Acceptance Criteria:
prepareWorkerInfo方法合并 buildWorkerInfo + injectSlackEnv + injectGatewayContextmake test通过,bridge_test.go 零回归Implementation Priority
270→20 行,可测试性大幅提升230→30 行循环,需引入 forwardContextRecommended starting point: 先解决 P0 buildworkerinfo trio(小投入、即时收益),再处理 P1 performInit(可测试性改善最大)。
Out of Scope
以下区域有意不更改:
Verification
make test通过,无回归make lint不产生新警告