Skip to content

Commit 763a2ba

Browse files
Alezander9bcode
andauthored
exclude: remove user-contributed domain-skills from vendored harness (#27)
* exclude: remove user-contributed domain-skills from vendored harness Domain-skills (`agent-workspace/domain-skills/` post-PR-#229 + top-level `domain-skills/` from PR #247) are user-contributed site recipes maintained on browser-use/browser-harness. We exclude them from browsercode's vendored tree on quality, maintenance, and prompt-injection grounds. Browsercode (cloud- first, performance-focused) curates its own skills server-side. Mechanism — three places that all reference UPSTREAM.md §3 "Excluded paths" as the source of truth: 1. `script/check-harness-diff.sh` gains an `IGNORED_PATHS_REGEX` filter applied before the noise/expected/unexpected split, so future syncs treat upstream domain-skills changes as if they don't exist. 2. `harness-sync.md` step 5 documents the skip rule as a top-row action in the file-category table; sync agents pre-filter excluded paths instead of deciding per-file. 3. The directories themselves are removed from the vendored tree (`git rm -r`). Runtime safety: `helpers.goto_url()` already guards with `if d.is_dir():`, so absence is a clean no-op (no exception, no missing key, no broken tool). Smoke-tested: `from browser_harness import run, helpers, daemon, admin, _ipc` imports cleanly; `browser-harness --version` prints 0.1.0. Doc trims (`SKILL.md`, `README.md`, `install.md`) are minimal surgical edits — only direct `domain-skills/` references are removed; surrounding prose is preserved so future sync diffs stay localized to specific lines. UPSTREAM.md §3 records these as "expect ongoing drift on sync" so future sync agents reconcile them hunk-by-hunk. Net: 82 files removed (76 agent-workspace skills + 4 shopify-admin + 2 README files), 7 files modified, 1 PR. Maintenance cost going forward: zero on the excluded paths (filtered automatically); small on the three trimmed docs. Refs: AGENTS.md, UPSTREAM.md \u00a73. * revert doc-file trims; keep verbatim, plan custom prompt later Per discussion in originating thread: `README.md` and `install.md` are not referenced by any browsercode prompt or TS code, so trimming them changed nothing the agent reads. `SKILL.md` IS referenced by `packages/opencode/src/tool/browser-execute.txt`, but that pointer is ours — long-term plan is to replace it with a browsercode-owned prompt file that we evolve independently of upstream, which makes vendored `SKILL.md` inert too. Net: trimming these files added per-sync drift forever for zero agent-behavior benefit. Reverting to upstream verbatim eliminates the drift and keeps future syncs mechanical. UPSTREAM.md §3 'Modified files' table loses three rows (only `.gitignore` remains). Replaced with a paragraph explaining why domain-skills mentions are tolerated in these files: the agent never reads them, or won't once we own the prompt. Verified: smoke test clean, `check-harness-diff.sh` post-revert shows only the expected `.gitignore` divergence + 3 real upstream commits we haven't synced (daemon.py, run.py, test_run.py). Roadmap entry for the custom prompt replacement work tracked separately in maintainer memory. --------- Co-authored-by: bcode <bcode@agents.local>
2 parents 9efebbf + ea8c20b commit 763a2ba

86 files changed

Lines changed: 58 additions & 26949 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

AGENTS.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -143,8 +143,9 @@ src-layout reorg):
143143
- `src/browser_harness/*.py` (`daemon.py`, `admin.py`, `helpers.py`,
144144
`run.py`, `_ipc.py`) — protected. Pull verbatim. If behavior change is
145145
needed, upstream a PR to `browser-use/browser-harness`.
146-
- `interaction-skills/`, `agent-workspace/domain-skills/` — verbatim.
147-
Never edit.
146+
- `interaction-skills/` — verbatim. Never edit.
147+
- `(agent-workspace/)?domain-skills/`**excluded** from vendored tree.
148+
Sync agents skip these paths; see UPSTREAM.md §3 "Excluded paths".
148149

149150
Sync workflow lives in `harness-sync.md`.
150151

UPSTREAM.md

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -91,21 +91,38 @@ Each upstream has its own append-only table. Add a row every time you pull.
9191

9292
---
9393

94-
## 3. Harness divergences
94+
## 3. Harness divergences and excluded paths
9595

96-
Per-file record of where `packages/bcode-browser/harness/` deliberately differs from upstream. Read this *before* a sync diff so intentional differences aren't mistaken for missing features.
96+
Per-file record of where `packages/bcode-browser/harness/` deliberately differs from upstream, plus the list of paths excluded from the vendored tree entirely. Read this *before* a sync diff so intentional differences aren't mistaken for missing features and excluded paths aren't accidentally re-imported.
9797

9898
Path-allowlist policy (decisions.md §3.7, §4.5; updated for upstream PR #229 src-layout reorg):
9999

100100
- `agent-workspace/agent_helpers.py` — editable; primary BrowserCode extension surface. Divergences expected.
101101
- `src/browser_harness/*.py` (`daemon.py`, `admin.py`, `helpers.py`, `run.py`, `_ipc.py`) — protected. Pulled verbatim from upstream. If behavior change is needed, upstream a PR to `browser-use/browser-harness`.
102-
- `interaction-skills/`, `agent-workspace/domain-skills/` — verbatim from upstream. We never edit these.
102+
- `interaction-skills/` — verbatim from upstream. We never edit these.
103+
- `(agent-workspace/)?domain-skills/`**excluded.** See "Excluded paths" below.
103104
- Other files (`pyproject.toml`, `LICENSE`, `README.md`, etc.) — divergence allowed but discouraged.
104105

106+
### Excluded paths
107+
108+
Upstream paths the vendored tree treats as if they don't exist. Sync agents skip them; the diff checker filters them out. The runtime guard in `helpers.py` (`if d.is_dir():` in `goto_url`) means absence is a clean no-op.
109+
110+
| Pattern | Reason |
111+
|---|---|
112+
| `(agent-workspace/)?domain-skills/**` | User-contributed site recipes. Quality, maintenance, and prompt-injection concerns. Browsercode (cloud-first, performance-focused) curates its own skills server-side; OSS users get the harness without bundled recipes. Both upstream paths covered: post-PR-#229 `agent-workspace/domain-skills/` and the legacy/PR-#247 top-level `domain-skills/`. The exclusion is enforced in three places that all reference this row: `script/check-harness-diff.sh` (`IGNORED_PATHS_REGEX`), `harness-sync.md` step 5 ("Excluded paths" row), and the absence of these directories from the vendored tree. |
113+
114+
### Modified files
115+
105116
| File | Section | Direction | Reason |
106117
|---|---|---|---|
107118
| `.gitignore` | venv entry | added `.venv/` | smoke-test workflow creates `.venv/` in the harness dir; we ignore it. Upstream uses CWD-level venv so doesn't need this. |
108119

120+
The vendored harness's `SKILL.md`, `README.md`, and `install.md` reference `agent-workspace/domain-skills/`, but we keep them verbatim from upstream. Rationale:
121+
122+
- `README.md` and `install.md` are not referenced by any browsercode prompt or TS code — the agent never reads them. Their content is dead weight in the extracted cache, not agent-visible.
123+
- `SKILL.md` is referenced by `packages/opencode/src/tool/browser-execute.txt` today, but the long-term plan (see ROADMAP) is to replace that pointer with a browsercode-owned prompt file, making vendored `SKILL.md` inert too.
124+
- Trimming these files would generate per-sync drift forever for zero agent-behavior benefit. Keeping them verbatim costs nothing and keeps future syncs mechanical.
125+
109126
---
110127

111128
## Drift checker

harness-sync.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ git pull origin main
2828
Two things to read before touching anything:
2929

3030
- **`UPSTREAM.md`** — the latest `To SHA` row under `### browser-use/browser-harness`. That is the last commit we synced to. It is the only source of truth for "what version is vendored."
31-
- **`UPSTREAM.md` §3 Harness divergences** — the table of files where we deliberately differ from upstream, with reasons. Read this *before* the diff so you know which differences are intentional and not "missing features."
31+
- **`UPSTREAM.md` §3 Harness divergences and excluded paths** — the table of files where we deliberately differ from upstream, plus the list of paths excluded from the vendored tree entirely. Read both *before* the diff so you know which differences are intentional and not "missing features," and which paths to skip outright.
3232

3333
If the divergences table is empty (initial vendor state), every difference between us and upstream is unintentional drift; flag any in the PR.
3434

@@ -65,14 +65,16 @@ This is where the agent earns its keep. For each file changed in `<recorded-sha>
6565

6666
| File category | Action |
6767
|---|---|
68-
| Files not in our divergences table (incl. `src/browser_harness/*.py`, `agent-workspace/domain-skills/`, `interaction-skills/`, `tests/`, `pyproject.toml`, `LICENSE`, etc.) | Take upstream verbatim — `cp temp/browser-harness/<path> packages/bcode-browser/harness/<path>`. |
68+
| **Excluded paths** (`(agent-workspace/)?domain-skills/...`) | **Skip entirely.** Never copy in, never resurrect. See UPSTREAM.md §3 "Excluded paths". `script/check-harness-diff.sh` filters these out automatically. |
69+
| Files not in our divergences table (incl. `src/browser_harness/*.py`, `interaction-skills/`, `tests/`, `pyproject.toml`, `LICENSE`, etc.) | Take upstream verbatim — `cp temp/browser-harness/<path> packages/bcode-browser/harness/<path>`. |
6970
| Files in our divergences table | Read each upstream hunk. For each, decide: **take** (apply upstream change to our file), **skip** (our divergence wins, ignore upstream change), or **adapt** (rewrite our divergence to coexist with the upstream change). Update the divergences row if its reason or scope shifts. |
70-
| New upstream files | Copy in. |
71+
| New upstream files | Copy in (unless under an excluded path). |
7172
| Files we have but upstream removed | Decide: keep ours (record in divergences) or delete. |
7273

7374
Path-allowlist policy stays in force during sync resolution as well as normal development:
7475
- `agent-workspace/agent_helpers.py` — editable, agent's primary extension surface (post PR #229).
7576
- `src/browser_harness/*.py` (`daemon.py`, `admin.py`, `helpers.py`, `run.py`, `_ipc.py`) — protected. Always take upstream verbatim. If upstream regresses, file an issue at `browser-use/browser-harness` and pin to the prior SHA, do not patch locally.
77+
- `(agent-workspace/)?domain-skills/`**excluded.** Treat as if not in the upstream tree. Quality + prompt-injection concerns; user-contributed site recipes do not ship with browsercode. The runtime guard in `helpers.py` (`if d.is_dir():`) means this is a clean no-op.
7678

7779
### 6. Smoke test
7880

packages/bcode-browser/harness/agent-workspace/domain-skills/amazon/product-search.md

Lines changed: 0 additions & 198 deletions
This file was deleted.

0 commit comments

Comments
 (0)