Skip to content

[New Skill]: office/browser_use, governed web automation for agents (browser-use / Playwright) #139

@rosspeili

Description

@rosspeili

Skill Name

office/browser_use

What should this skill do?

Give agents an installable Skillware bundle for bounded browser automation on everyday web tasks — portals, forms, dashboards, pricing pages, using the open-source browser-use library (Playwright under the hood, MIT license).

Core behavior (v1)

  • Accept a natural-language task, optional start URL, and safety knobs (max steps, timeout, allowed domains, headless).
  • Optionally run compliance/tos_evaluator when start_url is set before launching the browser.
  • Execute a capped browser-use session inside skill.py and return structured JSON: status, steps taken, result text/extract, errors, optional artifact references.
  • Ship instructions.md so models know when to use this skill vs HTTP-only tools, static scrapers, or human handoff.
  • Category office/, same productivity lane as pdf_form_filler (mundane web/office workflows, not generic “automation”).

Rationale

  • Agents often need to interact with websites, not just read static pages; prompt-only or one-off Playwright scripts are hard to audit, reuse, and govern across models.
  • browser-use is a mature open stack for LLM-controlled browsers; Skillware adds manifest, constitution, deterministic JSON contract, and multi-model tool adapters.
  • Complements compliance/tos_evaluator (policy pre-check) without duplicating it; complements office/pdf_form_filler (local documents vs live web portals).
  • Self-hosted open stack, no vendor marketplace or proprietary operator runtime required.

Dependencies

  • Python packages (skill manifest requirements, not core skillware deps): browser-use, Playwright browser runtime (playwright install chromium).
  • LLM credentials for the browser-use agent (document in manifest env_vars + docs/usage/api_keys.md): e.g. OPENAI_API_KEY, GOOGLE_API_KEY, ANTHROPIC_API_KEY.
  • Optional env for defaults: provider/model selection for browser-use (exact names TBD in implementation).
  • Heavy deps stay isolated to this skill; contributors install via skill manifest + local Playwright setup.

Constitution & safety

  • Enforce allowed_domains when provided; fail closed on navigation outside the list.
  • Cap max_steps and timeout_seconds; no unbounded agent loops from a single execute().
  • Default run_tos_check: true for public start_url; include evaluator verdict in output.
  • Never return raw cookies, session tokens, or passwords in JSON.
  • No scraping behind auth unless an explicit flag (e.g. assume_authenticated_session) is true and documented.
  • Constitution text: no credential harvesting, no circumventing login or paywalls, lawful use only, operators responsible for site terms.

Implementation notes

  • Path: skills/office/browser_use/ from templates/python_skill/.
  • skill.py: sync wrapper around async browser-use Agent; stable error mapping to manifest output schema.
  • Optional pre-flight: load or call compliance/tos_evaluator logic when run_tos_check is true (mock in CI).
  • test_skill.py: dry_run, missing deps, domain violation, schema compliance — no live browser in default CI (optional integration marker if needed).
  • Docs: docs/skills/browser_use.md, row in docs/skills/README.md, usage examples for at least one provider loop per skill template.
  • Issuer: real name + email per CONTRIBUTING; contributor with browser-use / Playwright experience preferred.

Acceptance criteria

  • Maintainer approval of skill proposal
  • Full bundle per Skill Package Standard (manifest.yaml, skill.py, instructions.md, test_skill.py, card.json, docs, catalog row)
  • Bounded execution enforced in code (max steps, timeout, optional domain allowlist)
  • Optional TOS pre-check documented and tested (mocked)
  • pytest for dry_run, missing API key/deps, domain violation, output schema
  • CHANGELOG.md entry under [Unreleased] when merged
  • python -m flake8 . and relevant tests pass

Out of scope (v1)

  • Proprietary operator platforms or marketplace APIs
  • browser-use Cloud / stealth / CAPTCHA hosted services (future optional backend only)
  • Recorder UI, digital-twin, or voice interfaces
  • Unbounded autonomous browsing without step/time limits
  • Replacing SkillLoader or core framework
  • Adding browser-use to core pyproject.toml dependencies

Ideal Inputs & Outputs

Input

{
  "task": "Find the latest pricing page and return the three plan names and monthly prices.",
  "start_url": "https://example.com",
  "max_steps": 25,
  "timeout_seconds": 120,
  "allowed_domains": ["example.com"],
  "headless": true,
  "run_tos_check": true,
  "dry_run": false
}

Output

{
  "status": "completed",
  "task": "...",
  "result_text": "Basic $9/mo, Pro $29/mo, Enterprise contact sales",
  "steps_used": 14,
  "start_url": "https://example.com",
  "tos_check": {
    "is_safe_to_proceed": true,
    "confidence_score": 0.88
  },
  "errors": []
}

Targeted Models (if applicable)

Model Agnostic (All)

Metadata

Metadata

Assignees

No one assigned

    Labels

    discussionOpen discussion for RFCs and proposals.enhancementNew feature or requesthelp wantedExtra attention is neededskill requestRequest for a new capability to be added.
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions