feat(vision_mcq): write tests/test_vision_mcq.py

建立 Vision MCQ 完整的 pytest 測試檔案。

## 對應 CLAUDE.md
Section 6.6 - 建立測試檔案

## 任務內容

建立 `tests/test_vision_mcq.py`，至少涵蓋：

### Extractor 測試
- `get_name()` 回傳 \"vision_mcq\"
- `uses_vision` flag 為 True
- `extract()` 可正確提取 A/B/C/D
- `extract()` 可正確提取 Yes/No (POPE)
- `extract()` 對 None / 空字串的處理
- `extract()` 對含 reasoning 文字的處理

### Scorer 測試
- 重用 ExactMatchScorer，確認 PRESETS 註冊正確

### PRESETS 註冊
- 確認 `PRESETS[\"vision_mcq\"]` 存在
- 確認對應 `(VisionMCQExtractor, ExactMatchScorer)`

### Benchmark Registry
- 確認 mmbench, mmstar, mmmu, pope 都在 BENCHMARK_REGISTRY 中
- 確認 eval_method 都是 \"vision_mcq\"

### Example Dataset
- 4 個 benchmark 的 test.jsonl 都存在
- 格式正確（必要欄位齊全）
- 圖片檔案實際存在
- 樣本數符合預期 (10-20 筆)

### Image Encoding（evaluator helper）
- 本地檔案 → base64 data URI
- URL → 直接 passthrough
- 不存在的檔案 → 適當的錯誤處理

### Template File
- `twinkle_eval/templates/vision_mcq.yaml` 存在且可被 YAML parse

## 驗收標準
- [ ] `python3 -m pytest tests/test_vision_mcq.py -v` 全部通過
- [ ] `python3 -m pytest tests/ -v` 完整測試套件無新增失敗
- [ ] 不需要呼叫任何外部 API（pure unit test）

Part of Milestone #22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vision_mcq): write tests/test_vision_mcq.py #133

對應 CLAUDE.md

任務內容

Extractor 測試

Scorer 測試

PRESETS 註冊

Benchmark Registry

Example Dataset

Image Encoding（evaluator helper）

Template File

驗收標準

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(vision_mcq): write tests/test_vision_mcq.py #133

Description

對應 CLAUDE.md

任務內容

Extractor 測試

Scorer 測試

PRESETS 註冊

Benchmark Registry

Example Dataset

Image Encoding（evaluator helper）

Template File

驗收標準

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions