docs: Revise publication roadmap with rigorous academic standards #973

abrichr · 2026-01-17T05:17:09Z

Summary

Complete rewrite of /docs/publication-roadmap.md from the perspective of a skeptical reviewer at a top venue (NeurIPS, ICML, CHI). The goal is a paper that could be accepted - not just submitted.

Key Changes

Honest Evidence Assessment

Acknowledges that all 45 macOS tasks share the SAME first action (click Apple menu)
Notes WAA baseline was interrupted (1/8 tasks, agent bugs)
Frames results honestly: "trajectory-conditioned disambiguation of UI affordances"

Contribution Clarity

Demo-conditioning is a prompting strategy, not a new architecture
Explicitly positions as an empirical study, not algorithmic contribution
Lists what this is NOT: new model, training method, benchmark, or theoretical contribution

Statistical Rigor

Requires McNemar's test for paired comparisons
Bootstrap confidence intervals for all metrics
Effect size (Cohen's h) alongside p-values
Minimum sample size calculations (n >= 39 for 20pp effect at 80% power)

Experiment Design

Minimum viable: 20 tasks x 2 models x 3 trials = 240 runs
Full paper: ~1500 runs across WAA, WebArena, ablations
Essential ablations: demo format, relevance, k values
Required baselines: zero-shot, CoT, text-only few-shot, SOTA, random

Weakness Analysis

Anticipates reviewer criticisms with severity ratings
Identifies what we CANNOT fix (novelty, benchmark saturation)
Identifies what we CAN fix (benchmark coverage, multi-model, statistics)

Venue Fit

Realistic assessment: NeurIPS main <20%, Workshop 60-70%
Recommends: Workshop first (8 weeks), then CHI/UIST (6 months)
Only pursue main track IF WAA shows >30pp improvement

Risk Mitigation

Pivot strategies if results disappoint
Negative results paper option
Reviewer response templates

Document Structure

Current State of Evidence
Honest Contribution Assessment
Weakness Analysis
Required Experiments for Defensible Claims
Statistical Rigor Requirements
Related Work Gap Analysis (18 essential citations)
Venue Fit Analysis
Realistic Timeline
Risk Mitigation
Action Items

Appendices

A: Honest framing (abstract template, title options)
B: Cost estimates (~$200-400 API, ~$25 compute)
C: Reviewer response templates

Test plan

Markdown renders correctly
All internal links work
Tables format properly
Review by team for accuracy of current state

Generated with Claude Code

Complete rewrite of publication-roadmap.md from the perspective of a skeptical top-venue reviewer. Key changes: - Honest assessment of current evidence (n=45 macOS, n=8 WAA incomplete) - Acknowledgment that all 45 tasks share same first action (click Apple menu) - Clear novelty analysis: demo-conditioning is prompting strategy, not architecture - Anticipated reviewer criticisms with severity ratings - Required experiments: 240 runs minimum for workshop, ~1500 for full paper - Statistical rigor: McNemar's test, bootstrap CIs, effect sizes - 18 essential citations across GUI agents, PbD, VLMs, and RAG - Realistic venue assessment: workshop 60-70%, NeurIPS main <20% - 8-week timeline for workshop paper, 6 months for CHI full paper - Risk mitigation including pivot strategies if results disappoint - Cost estimates: $200-400 API, $25 compute - Reviewer response templates for common criticisms The goal is a paper that could be accepted at a top venue - not just submitted. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

abrichr merged commit 37170ee into main Jan 17, 2026
6 checks passed

abrichr deleted the feature/rigorous-publication-roadmap branch January 17, 2026 05:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

docs: Revise publication roadmap with rigorous academic standards #973

docs: Revise publication roadmap with rigorous academic standards #973

abrichr commented Jan 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

docs: Revise publication roadmap with rigorous academic standards #973

docs: Revise publication roadmap with rigorous academic standards #973

Conversation

abrichr commented Jan 17, 2026

Summary

Key Changes

Document Structure

Appendices

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants