Add token cost measurement to DOMShell vs CiC benchmark

## Goal
Extend the existing DOMShell vs CiC (Claude in Chrome) benchmark with token cost measurement.

## Background
The current experiment (`experiments/claude_domshell_vs_cic/`) measures **tool call counts** — DOMShell averaged 4.3 API calls per task vs 8.6 for screenshot-based browsing across 4 tasks, 8 trials.

Token cost hasn't been measured yet, but the gap should be even wider than the 2x on call count:
- **DOMShell** returns structured text (AX node names, roles, properties) — a full `tree` output for a complex page is ~2-3KB
- **Screenshot-based browsing** sends base64-encoded page captures — a single screenshot can be 500KB+ in the context window
- Vision tokens are typically priced higher than text tokens

## Tasks
- [ ] Add input/output token counting to the benchmark harness
- [ ] Re-run the existing 4-task suite with token tracking enabled
- [ ] Report total tokens per task for both approaches
- [ ] Compare cost at current API pricing (text vs vision tokens)
- [ ] Update `experiments/claude_domshell_vs_cic/` with results

## Hypothesis
Structured text (2-3KB per response) vs base64 screenshots (500KB+) should show >2x token savings on top of the 2x call-count reduction already measured.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add token cost measurement to DOMShell vs CiC benchmark #30

Goal

Background

Tasks

Hypothesis

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add token cost measurement to DOMShell vs CiC benchmark #30

Description

Goal

Background

Tasks

Hypothesis

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions