layout-1: 15 ground-truth entries are .mp4 files, not images (NIMA can't score)

## Summary

`layout-1` (`IntentToLayoutGeneration` — "Generate a flattened layout image from intent text") has 15 samples whose `ground_truth.image` points to an `.mp4` video file instead of a `.png` image. `layout-1`'s primary metric is `nima_score`, which runs a PyTorch image-aesthetics model on the file via `PIL.Image.open(...)` — PIL rejects video files (`UnidentifiedImageError`), so these samples can't be scored at all.

The benchmark's own `evaluate()` produces `nan` for every image metric on these samples when given an empty ModelOutput, and any oracle / agent run that tries to `Image.open(...)` the ground truth crashes outright.

## Scope

- **Directory:** `benchmarks/layout/layout2-intention-to-layout-generation/images/`
- **Counts:** 85 `.png` (valid) + 15 `.mp4` (broken) = 100 samples total
- **Affected samples (source IDs + filenames):**

| source_id | file |
|-----------|------|
| `layout-1:1`  | `04zQ50DpwzhXfznJC0fK.mp4` |
| `layout-1:9`  | `0av452xBKWsVWrPWZVM5.mp4` |
| `layout-1:13` | `0ixLHb8kLtuAVQavdRT6.mp4` |
| `layout-1:19` | `0wIPzODxoCDEcrkAmHyf.mp4` |
| `layout-1:27` | `1Okni6tFj315PiOVADBx.mp4` |
| `layout-1:29` | `1YrP2nlMDasJFMoFLUKS.mp4` |
| `layout-1:35` | `1kLkygM7pgrGcfMkvfjV.mp4` |
| `layout-1:50` | `2ALsFATNayguZJNbRMkA.mp4` |
| `layout-1:53` | `2GNqKi6AAlq3pRufiANh.mp4` |
| `layout-1:74` | `3f0qiFLXCUS8M72ySAO8.mp4` |
| `layout-1:75` | `3gIgFKADrYr3v2uN0n5W.mp4` |
| `layout-1:80` | `3pX1YupN1ulLamAZtVhg.mp4` |
| `layout-1:87` | `44QT524vSyhpljWrNEWr.mp4` |
| `layout-1:88` | `44ZMntKp3FqnLNiFoxdV.mp4` |
| `layout-1:91` | `4QXfdSFrS50zUtllJKos.mp4` |

## Reproduction

```python
from PIL import Image
Image.open("benchmarks/layout/layout2-intention-to-layout-generation/images/04zQ50DpwzhXfznJC0fK.mp4")
# -> PIL.UnidentifiedImageError: cannot identify image file '...mp4'
```

Full-source oracle sweep across layout-1's 100 samples hits this on exactly these 15, passes on the other 85.

## Suggested fix (pick one)

1. **Replace each `.mp4` with the intended `.png`** — ideal if the original flat layout renders are available. Zero downstream breakage.
2. **Drop the 15 entries from the layout-1 sample manifest** — simplest; total goes 100 → 85. Any consumer caching by source_id would need to re-sync.

## Impact

- Oracle verification can't reach 100% on full `layout-1` in any harness (upstream or adapter).
- Surfaced while verifying the Harbor adapter ([harbor-framework/harbor#1433](https://github.com/harbor-framework/harbor/pull/1433)) — current README documents 99.96% pass on the 33,786-task full source due to these 15.

source_id	file
`layout-1:1`	`04zQ50DpwzhXfznJC0fK.mp4`
`layout-1:9`	`0av452xBKWsVWrPWZVM5.mp4`
`layout-1:13`	`0ixLHb8kLtuAVQavdRT6.mp4`
`layout-1:19`	`0wIPzODxoCDEcrkAmHyf.mp4`
`layout-1:27`	`1Okni6tFj315PiOVADBx.mp4`
`layout-1:29`	`1YrP2nlMDasJFMoFLUKS.mp4`
`layout-1:35`	`1kLkygM7pgrGcfMkvfjV.mp4`
`layout-1:50`	`2ALsFATNayguZJNbRMkA.mp4`
`layout-1:53`	`2GNqKi6AAlq3pRufiANh.mp4`
`layout-1:74`	`3f0qiFLXCUS8M72ySAO8.mp4`
`layout-1:75`	`3gIgFKADrYr3v2uN0n5W.mp4`
`layout-1:80`	`3pX1YupN1ulLamAZtVhg.mp4`
`layout-1:87`	`44QT524vSyhpljWrNEWr.mp4`
`layout-1:88`	`44ZMntKp3FqnLNiFoxdV.mp4`
`layout-1:91`	`4QXfdSFrS50zUtllJKos.mp4`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

layout-1: 15 ground-truth entries are .mp4 files, not images (NIMA can't score) #6

Summary

Scope

Reproduction

Suggested fix (pick one)

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

layout-1: 15 ground-truth entries are .mp4 files, not images (NIMA can't score) #6

Description

Summary

Scope

Reproduction

Suggested fix (pick one)

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions