fix: relabel qwen3.5:9b 2026-04-07 as BrowseComp (was mislabeled SimpleQA) by LearningCircuit · Pull Request #12 · LearningCircuit/ldr-benchmarks

LearningCircuit · 2026-04-09T18:36:50Z

Summary

The qwen3.5:9b 2026-04-07 submission (merged as PR #10) was actually a BrowseComp run, not SimpleQA. The LDR exporter had a second bug: `dataset: SimpleQA` was hard-coded regardless of the actual benchmark.

This PR:

Changes `dataset: SimpleQA` → `dataset: BrowseComp`
Moves from `results/simpleqa/` to `results/browsecomp/langgraph-agent/serper/`

The exporter fix is in LearningCircuit/local-deep-research#3442.

…ed SimpleQA) The LDR YAML exporter hard-coded dataset as "SimpleQA". This run was actually xbench_deepsearch. Move to results/xbench-deepsearch/ and fix the dataset field. LDR exporter fix: LearningCircuit/local-deep-research#3442

peter-evans/create-pull-request restores the workspace to the base branch HEAD after creating its PR, reverting the freshly rebuilt leaderboards/CONTRIBUTORS files in the working tree. Running HF sync after this step uploaded the OLD main state, causing the HF dataset to silently lag behind the actual repo state. Symptom: after PR #12 (xbench relabel) merged, the publish workflow reported "Sync CSVs + README to Hugging Face: success" with the HF API responding "No files have been modified since last commit" — the workspace files at sync time matched HF because peter-evans had already restored them to main HEAD (which still had the pre-rebuild SimpleQA mislabel). Fix: move HF sync BEFORE the create-pull-request step so it operates on the freshly rebuilt files.

LearningCircuit force-pushed the fix-browsecomp-mislabel branch from c9df301 to 00aef39 Compare April 9, 2026 18:40

LearningCircuit merged commit b522a98 into main Apr 9, 2026
9 checks passed

LearningCircuit mentioned this pull request Apr 10, 2026

fix: run HF sync before create-pull-request to avoid stale uploads #14

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: relabel qwen3.5:9b 2026-04-07 as BrowseComp (was mislabeled SimpleQA)#12

fix: relabel qwen3.5:9b 2026-04-07 as BrowseComp (was mislabeled SimpleQA)#12
LearningCircuit merged 1 commit intomainfrom
fix-browsecomp-mislabel

LearningCircuit commented Apr 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LearningCircuit commented Apr 9, 2026

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant