Context
When current_start_frame >= local_attn_size, the current implementation (wan_flow_matching_action_tf.py:1410-1414) resets current_start_frame to 0, which triggers a full re-initialization on the next call: re-encode CLIP features, re-encode VAE, re-build all KV cache from scratch.
elif self.current_start_frame >= self.model.local_attn_size:
print("current_start_frame >= local_attn_size, reset current_start_frame to 0")
self.current_start_frame = 0
An alternative approach would be a sliding-window eviction: drop the oldest num_frame_per_block entries from the KV cache and shift the window forward, preserving recent history without a full restart.
Questions
- What is the observed quality degradation at the reset boundary? Since all history is lost at once, the first few chunks after a reset lack historical context. Is there a measurable action prediction quality dip right after reset?
Context
When
current_start_frame >= local_attn_size, the current implementation (wan_flow_matching_action_tf.py:1410-1414) resetscurrent_start_frameto 0, which triggers a full re-initialization on the next call: re-encode CLIP features, re-encode VAE, re-build all KV cache from scratch.An alternative approach would be a sliding-window eviction: drop the oldest
num_frame_per_blockentries from the KV cache and shift the window forward, preserving recent history without a full restart.Questions