Skip to content

refactor(session): SOLID + DRY — Manager SRP decomposition, metrics coupling, lock boilerplate, PoolManager interface #336

@aaronwong1989

Description

@aaronwong1989

Background

internal/session/ 模块包含 8 个源文件(~3,952 行),负责 5 状态机会话生命周期、SQLite 持久化、每用户配额管理和确定性 session key 生成。模块已在 issue 243 中完成 testability/code-quality 分析。本次为 首次从 SOLID + DRY + coupling 角度的系统性分析,发现 6 个独立于已有 issue 的架构问题。

Scope: solid, dry, coupling — cycle 2 (模块分析通过 1)
Key files: manager.go, pool.go, store.go, key.go


Finding Summary

Category Critical High Medium Low
SOLID/SRP 0 1 1 0
SOLID/DIP 0 1 0 0
DRY 0 0 2 0
Coupling 0 0 1 0
合计 0 2 4 0

Findings

SOLID: SRP

manager-mixed-responsibilities

Severity: High | Confidence: High | ROI: High
Location: manager.go (1,025 lines), Manager struct (lines 36-52)

Problem: Manager 承担 5+ 职责:(1) session CRUD + 缓存, (2) 5 状态机转换, (3) worker 生命周期管理 (attach/detach/terminate), (4) 后台 GC (zombie/max_lifetime/idle), (5) metrics 发射。其中 GC 逻辑(runGC + gc + ResetGCInterval, lines 841-969)共 128 行,是独立的定时任务调度,完全可以从 Manager 中提取。

transitionState (73 行) 混合了 6 个关注点:内存状态更新、idle 过期设置、DB 持久化 + rollback、worker 终止、metrics 更新、通知回调。尤其是 worker 终止块(lines 269-294)在 session 层直接调用 worker.Terminate() + metrics.WorkerExecDuration,违反了层次边界。

Current Pattern:

// manager.go:244-316 — transitionState handles 6 concerns
func (m *Manager) transitionState(ms *managedSession, to events.SessionState, reason string) error {
    // 1. Update in-memory state
    from := ms.info.State
    ms.info.State = to
    // 2. Set idle expiry
    if to == events.StateIdle { ms.info.IdleExpiresAt = ... }
    // 3. DB persist + rollback on failure
    rollback, err := m.updateSession(ms)
    // 4. Worker termination — session layer calling into worker layer!
    if to == events.StateTerminated || to == events.StateDeleted {
        if ms.worker != nil {
            go func() { ms.worker.Terminate(base.GracefulShutdownTimeout) }()
            m.releaseWorkerQuota(ms)  // pool interaction
            metrics.WorkerExecDuration.Observe(...)  // metrics
        }
    }
    // 5. Metrics emission
    metrics.SessionsActive.WithLabelValues(...).Dec()
    // 6. Notification callback
    if m.StateNotifier != nil { m.StateNotifier(...) }
}

Proposed Fix: 分阶段提取:

// Step 1: Extract GC into SessionGC
type SessionGC struct {
    store      Store
    sessions   func() map[string]*managedSession  // read-only access
    mu         *sync.RWMutex
    interval   time.Duration
    // ...
}

// Step 2: Extract worker lifecycle from transitionState
func (m *Manager) handleWorkerOnTransition(ms *managedSession, to events.SessionState) {
    if ms.worker == nil { return }
    if to == events.StateTerminated || to == events.StateDeleted {
        go ms.worker.Terminate(base.GracefulShutdownTimeout)
        m.releaseWorkerQuota(ms)
    }
}

// Step 3: Extract metrics helper
func (m *Manager) recordTransition(from, to events.SessionState) {
    metrics.SessionsActive.WithLabelValues(string(from)).Dec()
    metrics.SessionsActive.WithLabelValues(string(to)).Inc()
}

Estimated Impact: manager.go 减少 ~150 行,GC 和 worker lifecycle 可独立测试

Acceptance Criteria:

  • SessionGC 独立 struct,Manager 持有 *SessionGC 引用
  • transitionState ≤ 40 行,只处理状态 + 持久化
  • Worker 终止逻辑提取为独立方法
  • Metrics 发射通过 recordTransition helper 统一
  • make test 零回归

SOLID: DIP

concrete-poolmanager-dependency

Severity: High | Confidence: High | ROI: Medium
Location: manager.go:44 (pool *PoolManager), pool.go (no interface)

Problem: Manager 依赖具体 *PoolManager 而非接口。这导致:

  • 测试 Manager 必须构造真实 PoolManager(虽然简单,但限制了 mock 能力)
  • 无法替换为 no-op pool 或分布式 quota 实现
  • PoolManager 的 metrics 耦合传导到 Manager 测试中

Current Pattern:

// manager.go:44 — concrete dependency
type Manager struct {
    pool *PoolManager  // concrete type, no interface
    // ...
}

// pool.go — no interface defined
type PoolManager struct {
    mu         sync.Mutex
    totalCount int
    userCount  map[string]int
    // ...
}

Proposed Fix: 定义 Pool 接口并在 Manager 中使用:

type Pool interface {
    Acquire(userID string) error
    Release(userID string)
    AcquireMemory(userID string, mem int64) error
    ReleaseMemory(userID string, mem int64)
    Stats() PoolStats
    UpdateLimits(maxSize int, maxPerUser int)
}

type PoolStats struct {
    Total, MaxTotal int
    PerUser map[string]int
}

Estimated Impact: Manager 可使用 mock Pool 测试,解锁替代 quota 实现

Acceptance Criteria:

  • Pool 接口定义在 pool.gointerfaces.go
  • Manager.pool 字段类型改为 Pool 接口
  • PoolManager 实现 Pool 接口
  • manager_test.go 使用 mock Pool 替代真实 PoolManager

DRY

metrics-emission-pattern-repeated

Severity: Medium | Confidence: High | ROI: High
Location: manager.go:transitionState (lines 289-294), AttachWorker (lines 447-449), detachWorkerUnchecked (lines 481-486), Delete (lines 556-558), gc (lines 935-937)

Problem: metrics.SessionsActive.WithLabelValues(string(state)).Dec() / .Inc() 模式在 5+ 个位置重复,每个位置还伴随 metrics.WorkersRunningmetrics.WorkerExecDuration 的更新。新增状态或修改 metrics 标签需要修改所有位置。

Current Pattern:

// Repeated in transitionState, AttachWorker, detachWorkerUnchecked, Delete, gc:
metrics.SessionsActive.WithLabelValues(string(from)).Dec()
metrics.SessionsActive.WithLabelValues(string(to)).Inc()
metrics.WorkersRunning.WithLabelValues(wt).Dec()
metrics.WorkerExecDuration.WithLabelValues(wt).Observe(elapsed.Seconds())

Proposed Fix:

func recordStateTransition(from, to events.SessionState) {
    metrics.SessionsActive.WithLabelValues(string(from)).Dec()
    metrics.SessionsActive.WithLabelValues(string(to)).Inc()
}

func recordWorkerDone(wt string, elapsed time.Duration) {
    metrics.WorkersRunning.WithLabelValues(wt).Dec()
    metrics.WorkerExecDuration.WithLabelValues(wt).Observe(elapsed.Seconds())
}

Estimated Impact: ~25 行减少,metrics 变更只改 helper

Acceptance Criteria:

  • 所有 metrics.SessionsActive 调用集中到 helper 函数
  • 所有 metrics.WorkersRunning + WorkerExecDuration 调用集中到 helper
  • 现有测试零回归

lock-boilerplate-repeated-fifteen-times

Severity: Medium | Confidence: High | ROI: Medium
Location: manager.goGet, Transition, TransitionWithInput, AttachWorker, GetWorker, detachWorkerUnchecked, Delete, DeletePhysical, ClearContext, UpdateWorkerSessionID, DebugSnapshot, ListActive, WorkerHealthStatuses, TerminateAllWorkers, Lock, Unlock

Problem: "Manager.mu.RLock → lookup sessions map → managedSession.mu.RLock/Lock → operate → unlock" 样板代码重复约 15 次。getManagedSession helper 已集中了 lookup + cache-miss 逻辑,但锁获取/释放的 ordering boilerplate 在每个方法中仍重复。

Current Pattern:

// Repeated ~15 times across Manager methods:
func (m *Manager) SomeMethod(ctx context.Context, id string, ...) error {
    m.mu.RLock()
    ms, ok := m.sessions[id]
    m.mu.RUnlock()
    if !ok {
        ms = m.getManagedSession(ctx, id)
        if ms == nil { return ErrSessionNotFound }
    }
    ms.mu.Lock()
    defer ms.mu.Unlock()
    // ... actual logic (often 5-10 lines)
}

Proposed Fix: 提供 withSession 辅助方法统一锁获取:

func (m *Manager) withSession(ctx context.Context, id string, write bool, fn func(*managedSession) error) error {
    ms := m.getManagedSession(ctx, id)
    if ms == nil { return ErrSessionNotFound }
    if write {
        ms.mu.Lock()
        defer ms.mu.Unlock()
    } else {
        ms.mu.RLock()
        defer ms.mu.RUnlock()
    }
    return fn(ms)
}

Estimated Impact: ~60 行锁样板减少,统一锁 ordering 保证

Acceptance Criteria:

  • withSession helper 方法覆盖 read/write 锁获取
  • 至少 5 个 Manager 方法使用 withSession 替代手动锁
  • 锁 ordering (Manager.mu → managedSession.mu) 在所有路径上保持一致
  • go test -race ./internal/session/... 通过

Coupling

nil-guard-pattern-repeated-eight-times

Severity: Medium | Confidence: Medium | ROI: Medium
Location: manager.go:329, 354, 410, 460, 494, 600, 628, 652

Problem: 8 个公共方法以 if m == nil { return ErrSessionNotFound } 开头。这种模式表明 Manager 可能在初始化完成前被调用,或作为可选参数传递。nil-guard 本身是防御性编程,但重复 8 次意味着调用约定不够清晰。

Current Pattern:

// Lines 329, 354, 410, 460, 494, 600, 628, 652:
func (m *Manager) SomeMethod(...) error {
    if m == nil {
        return ErrSessionNotFound
    }
    // ...
}

Proposed Fix: 选项 A — 确保 Manager 在所有调用路径上都已初始化,移除 nil-guard。选项 B — 提供 SafeManager wrapper:

type SafeManager struct { *Manager }
func (m SafeManager) Transition(...) error {
    if m.Manager == nil { return ErrSessionNotFound }
    return m.Manager.Transition(...)
}

Estimated Impact: 消除 8 处重复防御性检查,明确调用约定

Acceptance Criteria:

  • 确认所有调用路径 Manager 已初始化(或添加 SafeManager wrapper)
  • 8 处 nil-guard 统一为一种模式
  • 现有测试零回归

Implementation Priority

Finding Priority Effort Risk Impact
metrics-emission-pattern-repeated P1 Small Low ~25 行减少,统一 metrics 调用
manager-mixed-responsibilities P2 Large Medium ~150 行减少,GC/worker 可独立测试
lock-boilerplate-repeated P2 Medium Low ~60 行减少,统一锁 ordering
concrete-poolmanager-dependency P3 Medium Low 解锁 mock Pool 测试
nil-guard-pattern-repeated P3 Small Low 明确调用约定
poolmanager-metrics-coupling P4 Small Low 可选 metrics(Low ROI 已丢弃)

Recommended starting point: metrics-emission-pattern-repeated — 最小改动,高回报,消除 5+ 处重复


Out of Scope

  • getManagedSession context.Background() — 已在 issue 243 跟踪
  • Phantom mock DeleteExpiredEvents — 已在 issue 243 跟踪
  • GC decomposition — 已在 issue 243 跟踪 (code-quality aspect)
  • DerivePlatformSessionKey switch on platform type — Low ROI, 小函数
  • PoolManager metrics coupling — Low ROI(可从 concrete-poolmanager-dependency 顺便解决)

Verification

  • make test 通过,无回归
  • make lint 不产生新警告
  • go test -race ./internal/session/... 通过

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Medium: tech debt, refactoring, improvementsarchitectureDomain: design patterns, coupling, separation of concernsarea/sessionScope: session manager, state machine, store, poolrefactorRefactor: DRY, SOLID, code quality improvements

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions