Skip to content

Add token cost measurement to DOMShell vs CiC benchmark #30

@apireno

Description

@apireno

Goal

Extend the existing DOMShell vs CiC (Claude in Chrome) benchmark with token cost measurement.

Background

The current experiment (experiments/claude_domshell_vs_cic/) measures tool call counts — DOMShell averaged 4.3 API calls per task vs 8.6 for screenshot-based browsing across 4 tasks, 8 trials.

Token cost hasn't been measured yet, but the gap should be even wider than the 2x on call count:

  • DOMShell returns structured text (AX node names, roles, properties) — a full tree output for a complex page is ~2-3KB
  • Screenshot-based browsing sends base64-encoded page captures — a single screenshot can be 500KB+ in the context window
  • Vision tokens are typically priced higher than text tokens

Tasks

  • Add input/output token counting to the benchmark harness
  • Re-run the existing 4-task suite with token tracking enabled
  • Report total tokens per task for both approaches
  • Compare cost at current API pricing (text vs vision tokens)
  • Update experiments/claude_domshell_vs_cic/ with results

Hypothesis

Structured text (2-3KB per response) vs base64 screenshots (500KB+) should show >2x token savings on top of the 2x call-count reduction already measured.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions