Skip to content

v0.3

Latest

Choose a tag to compare

@Jeomon Jeomon released this 24 Apr 16:26
· 1 commit to main since this release

What's New in v0.3

New Features

  • PDF support in scrape_tool — extract content from PDF pages directly; specify individual pages with pages=[1,5,10]
  • OAuth 2.0 + PKCE authentication — built-in OAuth flow for sites that require it
  • WebMCP integration — agents can discover and call custom tools exposed by websites via the WebMCP protocol
  • Loop detectionLoopGuard detects page cycles and repeated failed retries, with prompt rules to break out automatically
  • keep_alive + disconnect() — keep the browser alive across agent runs and disconnect explicitly when done
  • within_viewport parameter on get_state — pass within_viewport=False to get all interactive elements across the entire DOM regardless of scroll position
  • Scroll position hints — browser state now includes scroll percentage and position hints for the agent

Improvements

  • Unified semantic treeDOMNode replaces separate TreeNode/TreeNodeData types; tree is now built from real DOM parent-child traversal instead of XPath reconstruction
  • Richer semantic tree output — shows id/class in CSS selector notation, and role when it differs from tag
  • Improved textual element detection — additional tags and correct inline text extraction
  • DOM capture timing — logs state_capture_ms and screenshot_capture_ms for performance visibility
  • Multiple performance optimizations across the agent loop
  • Migrated to uv package manager
  • Removed Playwright dependency — fully CDP-native via bundled src/cdp/ module

Bug Fixes

  • Fixed PDF text extraction (switched to get_text('html') + markdownify)
  • Fixed done_tool over-condensing the final output
  • Fixed bounding boxes disappearing when page is scrolled
  • Fixed viewport element filtering to correctly account for scroll offset
  • Fixed scroll position key names in DOM viewport filtering
  • Fixed sub-frame/worker crash handling in CrashWatchdog
  • Fixed 10 s _wait_for_page timeouts by tracking navigation state
  • Fixed browser stability and agent crash handling
  • Fixed Gemini tool-calling when thought signature is absent