Refactor Mooncake Store: use force delete and hard pin to replace TTL-based deferred deletion

## Background

The current hidden states lifecycle management in Mooncake Store relies on the lease TTL mechanism: after consumption, `DeferredDeleteManager` must wait for the TTL to expire before deleting objects (Mooncake does not allow deleting objects with active leases). This creates a fundamental coupling problem — the larger the TTL, the longer consumed hidden states remain in the store, and the higher the memory pressure.

### Production Issue

With `kv_lease_ttl_s: 300`, consumed hidden states remain in the store for up to 5 minutes, causing:
- Total memory = in-flight hidden states + consumed-but-waiting-for-TTL hidden states
- Exceeds Mooncake Store capacity → triggers LRU eviction
- Eviction may delete keys still in use → `batch_get` failures

## Current Implementation

- `torchspec/transfer/mooncake/deferred_delete.py`: `DeferredDeleteManager` enqueues deletions after consumption; a background thread waits `ttl + 0.5s buffer` then calls `store.remove()`, retrying up to 3 times on failure
- `torchspec/transfer/mooncake/eagle_store.py`: `remove_eagle3_tensors()` submits 4 tensor keys to the deferred delete manager
- `torchspec/config/mooncake_config.py`: `kv_lease_ttl_s` defaults to 5.0s
- `torchspec/transfer/mooncake/utils.py:110`: TTL converted to milliseconds and passed to mooncake master launch args

## Proposed Refactoring

Newer versions of Mooncake support `force delete` and `hard pin`. We should leverage these APIs to refactor the deletion logic:

### 1. Force Delete: immediate deletion after consumption
- Call force delete immediately after consumption instead of waiting for TTL expiration
- `DeferredDeleteManager` can be significantly simplified or removed — only retry logic needs to remain
- Completely decouples TTL from deletion timing

### 2. Hard Pin: protect in-flight hidden states
- Apply hard pin to hidden states that are being transferred or awaiting consumption, preventing eviction from deleting them
- Unpin + force delete after consumption completes

### 3. Cleanup
- Remove or simplify TTL-waiting logic in `DeferredDeleteManager`
- `kv_lease_ttl_s` should no longer affect application-level data lifecycle management

## Expected Outcome

- Store memory freed immediately after consumption — no TTL delay
- In-flight hidden states protected by hard pin, immune to eviction
- Store memory usage depends only on pipeline concurrency, not TTL
- Improved robustness: no longer relies on the implicit assumption that "store capacity > total in-flight hidden states"

## Related Files

- `torchspec/transfer/mooncake/deferred_delete.py`
- `torchspec/transfer/mooncake/eagle_store.py`
- `torchspec/transfer/mooncake/utils.py`
- `torchspec/config/mooncake_config.py`
- `torchspec/training/data_fetcher.py`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Mooncake Store: use force delete and hard pin to replace TTL-based deferred deletion #72

Background

Production Issue

Current Implementation

Proposed Refactoring

1. Force Delete: immediate deletion after consumption

2. Hard Pin: protect in-flight hidden states

3. Cleanup

Expected Outcome

Related Files

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactor Mooncake Store: use force delete and hard pin to replace TTL-based deferred deletion #72

Description

Background

Production Issue

Current Implementation

Proposed Refactoring

1. Force Delete: immediate deletion after consumption

2. Hard Pin: protect in-flight hidden states

3. Cleanup

Expected Outcome

Related Files

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions