Skip to content

Operator simplification: REST-only SCM + standardized LLM_* config#134

Draft
mountainowl wants to merge 4 commits into
mainfrom
feat/rest-only-llm-config
Draft

Operator simplification: REST-only SCM + standardized LLM_* config#134
mountainowl wants to merge 4 commits into
mainfrom
feat/rest-only-llm-config

Conversation

@mountainowl

@mountainowl mountainowl commented Jun 23, 2026

Copy link
Copy Markdown
Owner

Draft umbrella PR. More changes coming (self-contained Docker image is next; will be pushed onto this branch).

What this does

1. REST-only SCM — drop glab/gh + SCM MCP servers (refactor(scm))

  • Checkout no longer shells out to glab/gh. Clone/fetch/checkout go through git with the token passed per-invocation via git -c http.extraHeader=... so the token never lands in .git/config (the agent has read access to the worktree).
  • Inline posting is now REST-only on both providers (the MCP-preferred path + fallback is gone).
  • Removed the mcp-upstream dispatcher, the [mcp_servers.gitlab] block (~310 lines) and GitLab tool perms from the templates, and denied_tools_regex / GITLAB_DENIED_TOOLS_REGEX.
  • Kept bubo's own MCP server (bin/bubo mcp, [mcp_servers.bubo]) and the local-git MCP.
  • Net runtime SCM deps after this change: git + the agent CLI.

2. Standardize LLM config + OpenAI-compat (feat(config))

  • Collapsed the [agents] auth/model knobs to LLM_API_KEY, LLM_MODEL, LLM_MODEL_EFFORT, plus optional LLM_BASE_URL for OpenAI-compatible endpoints.
  • Fixes a latent bug: llm_model / reasoning_effort were hardcoded in codex-config.toml (no substitution), so they silently never took effect. bubo init now templates the real values into the Codex profile (and a [model_providers.bubo] block when llm_base_url is set).
  • Backward-compatible: old keys (reasoning_effort, llm_api_key_env) are still honored with a config_key_deprecated log event.
  • action.yml reordered to write env.toml before bubo init so templating sees the real values.

3. Anonymous, opt-out usage analytics → PostHog (feat(analytics))

"Help improve Bubo." On by default; numbers only, no identifying content.

  • New [analytics] block. Opt out via enabled = false, BUBO_ANALYTICS=0, or the cross-tool DO_NOT_TRACK=1.
  • Emits anonymized events through the OTel logs API directly (no stdlib logging, so bubo's own logs — which carry repo names/paths — can never leak): session_start, review_completed (incl. lines-of-code reviewed), and a per-cycle usage_snapshot built from the existing SQLite aggregate readers.
  • Privacy enforced structurally by a default-deny allowlist (_ALLOWED_ATTRS): only counts, durations, LoC, tokens, SCM type, and model name leave the machine — never repo/project names, paths, SHAs, finding text, errors, or credentials. The resource is built without env-merge so OTEL_RESOURCE_ATTRIBUTES can't ride along. Anonymous random install id = PostHog distinct_id.
  • Best-effort throughout (bounded timeouts, all exceptions swallowed); the extra LoC API call is skipped when analytics is off.
  • ⚠️ Not yet verified live: events are shape- and safety-tested, but end-to-end delivery to PostHog (and the distinct_id → identity mapping) has not been confirmed against a real project. That's the gate before relying on the data.

Docs

  • Operator docs updated for all three changes (prerequisites, recipes, configuration, install-and-configure, github-action).
  • Note: we intend to replace the MkDocs site going forward; this PR keeps it in sync for now so nothing breaks in the interim.

Verification

  • ruff ✅ · mypy ✅ · pytest 538 passed (+30 new analytics tests; 4 pre-existing test_project_layout README-link failures present on main, unrelated) · mkdocs build --strict exit 0 · cz check ✅.

Still to come on this branch

  • build(docker): bundle both agent CLIs (@openai/codex, @anthropic-ai/claude-code) + node + git in the image; env-driven entrypoint.

Checkout now clones over HTTPS with the bot token supplied per git
invocation as an auth header (never written to .git/config); posting
goes straight through the REST API. Removes the glab/gh CLIs, the
upstream GitLab/GitHub MCP servers (and bin/bubo mcp-upstream), the
bubo.mcp client, the [mcp_servers.gitlab] profile block + its tool
allowlist, and denied_tools_regex.

Net SCM dependency is now git + the agent CLI. Bubo's own MCP server
and the local-git MCP are unchanged.
Collapse the LLM agent knobs to LLM_API_KEY / LLM_MODEL / LLM_MODEL_EFFORT
plus an optional LLM_BASE_URL for custom OpenAI-compatible endpoints, and
make them actually take effect: `bubo init` now templates the model, effort,
and (when a base_url is set) a [model_providers] block into the Codex profile,
and writes the model into the Claude settings. Previously model/effort were
hardcoded in the template and silently ignored.

The credential-stripping reviewer allowlist gains one deliberate, documented
exception: in base_url mode (and only then) LLM_API_KEY + LLM_BASE_URL are
passed through to the agent, since a custom endpoint reads the key from the
environment at request time.

`llm_api_key_env` is deprecated (still honored); `reasoning_effort` is read as
a fallback for the new `llm_model_effort`. The GitHub Action writes config
before templating the agent profile and supports llm-base-url / llm-model-effort.

Auto-running the agent login from LLM_API_KEY at init is deferred to a follow-up
(needs validation against the real Codex/Claude CLIs); for now the agent
authenticates via its own login.
Reflect the two code changes in the operator docs:

- Drop glab/gh and the SCM MCP servers from prerequisites, recipes, and
  configuration; the only runtime SCM deps are now git + the agent CLI.
- Document the standardized LLM surface (LLM_API_KEY, LLM_MODEL,
  LLM_MODEL_EFFORT) and the optional LLM_BASE_URL for OpenAI-compatible
  endpoints, including the env-exfiltration tradeoff of base_url mode.
- Refresh the GitHub Action inputs and recipes accordingly.
Bubo is free and open source; the only way to learn what to improve is
anonymous usage signal from real installs. Add an on-by-default
[analytics] block that ships NUMBERS ONLY to PostHog over OTLP logs.

- New [analytics] config (on by default). Opt out via `enabled = false`,
  `BUBO_ANALYTICS=0`, or the cross-tool `DO_NOT_TRACK=1` convention.
- src/bubo/analytics.py emits anonymized events directly through the OTel
  logs API (no stdlib logging, so bubo's own logs can never leak):
  session_start, review_completed (incl. lines-of-code reviewed), and a
  per-cycle usage_snapshot derived from the existing SQLite aggregate
  readers (metrics/outcomes/latency).
- Privacy is enforced structurally by a default-deny allowlist: only
  counts, durations, LoC, tokens, SCM type, and model name leave the
  machine. Never repo/project names, paths, SHAs, finding text, errors,
  or credentials. Resource is built without env-merge so OTEL_RESOURCE_*
  cannot ride along. An anonymous random install id is the PostHog
  distinct_id so distinct installs can be counted without identifying them.
- Best-effort throughout: bounded timeouts, all exceptions swallowed;
  analytics never slows or breaks a review, and the extra LoC API call is
  skipped entirely when analytics is disabled.
- Adds opentelemetry-exporter-otlp-proto-http; docs + env.example updated.
Comment thread src/bubo/analytics.py
if _logger is None and not _logger_failed:
built = _build_pipeline(cfg)
if built is None:
_logger_failed = True
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants