-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Goal
Extend the existing DOMShell vs CiC (Claude in Chrome) benchmark with token cost measurement.
Background
The current experiment (experiments/claude_domshell_vs_cic/) measures tool call counts — DOMShell averaged 4.3 API calls per task vs 8.6 for screenshot-based browsing across 4 tasks, 8 trials.
Token cost hasn't been measured yet, but the gap should be even wider than the 2x on call count:
- DOMShell returns structured text (AX node names, roles, properties) — a full
treeoutput for a complex page is ~2-3KB - Screenshot-based browsing sends base64-encoded page captures — a single screenshot can be 500KB+ in the context window
- Vision tokens are typically priced higher than text tokens
Tasks
- Add input/output token counting to the benchmark harness
- Re-run the existing 4-task suite with token tracking enabled
- Report total tokens per task for both approaches
- Compare cost at current API pricing (text vs vision tokens)
- Update
experiments/claude_domshell_vs_cic/with results
Hypothesis
Structured text (2-3KB per response) vs base64 screenshots (500KB+) should show >2x token savings on top of the 2x call-count reduction already measured.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels