Question: Why does `test_client_AR.py` send 4 frames per chunk instead of 8?


## Context

The training script (`scripts/train/droid_training_full_finetune_wan22.sh`) explicitly sets:

```bash
num_frame_per_block=2
num_action_per_block=24
```

According to the VAE encoding logic in `wan_flow_matching_action_tf.py`, the recommended number of input frames for subsequent chunks is `4 × num_frame_per_block`. With `num_frame_per_block=2`, this should be **8 frames**.

However, the official test client and server code both use **4 frames** per chunk.

## The Discrepancy

| Component | Frames per chunk | Implied `num_frame_per_block` |
|-----------|-----------------|-------------------------------|
| Training script (`droid_training_full_finetune_wan22.sh:117`) | — | **2** (explicit override) |
| Base YAML config (`wan_flow_matching_action_tf.yaml:14`) | — | 1 (default) |
| `test_client_AR.py:52` | **4** (4 offsets) | 1 |
| `socket_test_optimized_AR.py:55` | **4** | 1 |
| `eval_utils/serve_dreamzero_wan22.py:73` | **4** | 1 (comment: "matches 5B num_frame_per_block") |

The official serving/testing code appears to be written against the YAML default (`num_frame_per_block=1`), not the actual training configuration (`num_frame_per_block=2`).

## What happens when 4 frames are sent with `num_frame_per_block=2`

In the VAE encoding path (`wan_flow_matching_action_tf.py:1108-1122`), 4 frames triggers the **repeat branch**:

```
Input: T=4

Condition check:
  (T-1)//4 = 0 ≠ 2  → skip
  T//4     = 1 ≠ 2  → enters repeat branch

repeat_factor = num_frame_per_block // (T//4) = 2 // 1 = 2

Step 1: repeat_interleave(repeats=2, dim=2)
  [f0, f1, f2, f3] → [f0, f0, f1, f1, f2, f2, f3, f3]  (8 frames)

Step 2: prepend first frame
  [f0, f0, f0, f1, f1, f2, f2, f3, f3]  (9 frames = 4×2+1) ✓
```

This works but **each frame is duplicated** to fill the gap, resulting in redundant information in the VAE latents.

## What happens when 8 frames are sent (recommended)

```
Input: T=8

Condition check:
  (T-1)//4 = 1 ≠ 2  → skip
  T//4     = 2 == 2  → enters prepend-only branch ✓

Step 1: prepend first frame
  [f0, f0, f1, f2, f3, f4, f5, f6, f7]  (9 frames = 4×2+1) ✓
```

All 8 frames carry unique information. The prepended `f0` duplicate ends up in latent 0 which gets discarded anyway, so there is **no information loss**.

## VAE latent comparison

**4 frames (with repeat):**
```
VAE input: [f0, f0, f0, f1, f1, f2, f2, f3, f3]
  latent 0: f0              → discarded (no loss)
  latent 1: f0, f1, f1      → f0 is redundant, f1 is duplicated
  latent 2: f2, f2, f3, f3  → f2, f3 are duplicated
```

**8 frames (prepend-only):**
```
VAE input: [f0, f0, f1, f2, f3, f4, f5, f6, f7]
  latent 0: f0              → discarded (no loss, was duplicate)
  latent 1: f0, f1, f2, f3  → 4 unique frames
  latent 2: f4, f5, f6, f7  → 4 unique frames
```

## Questions

1. Is `num_frame_per_block=2` the intended production configuration for the 5B model? If so, should the serving code (`socket_test_optimized_AR.py`, `serve_dreamzero_wan22.py`) be updated to use 8 frames per chunk?

2. Is the frame duplication via `repeat_interleave` an intentional fallback for clients that cannot provide enough frames, or is it a sign that the client should be sending more frames?

3. Does the frame duplication in the repeat branch noticeably degrade action prediction quality compared to sending 8 unique frames?


Component	Frames per chunk	Implied `num_frame_per_block`
Training script (`droid_training_full_finetune_wan22.sh:117`)	—	2 (explicit override)
Base YAML config (`wan_flow_matching_action_tf.yaml:14`)	—	1 (default)
`test_client_AR.py:52`	4 (4 offsets)	1
`socket_test_optimized_AR.py:55`	4	1
`eval_utils/serve_dreamzero_wan22.py:73`	4	1 (comment: "matches 5B num_frame_per_block")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: Why does `test_client_AR.py` send 4 frames per chunk instead of 8? #46

Context

The Discrepancy

What happens when 4 frames are sent with `num_frame_per_block=2`

What happens when 8 frames are sent (recommended)

VAE latent comparison

Questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question: Why does test_client_AR.py send 4 frames per chunk instead of 8? #46

Description

Context

The Discrepancy

What happens when 4 frames are sent with num_frame_per_block=2

What happens when 8 frames are sent (recommended)

VAE latent comparison

Questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Question: Why does `test_client_AR.py` send 4 frames per chunk instead of 8? #46

What happens when 4 frames are sent with `num_frame_per_block=2`