Operator simplification: REST-only SCM + standardized LLM_* config by mountainowl · Pull Request #134 · mountainowl/bubo

mountainowl · 2026-06-23T21:02:44Z

Draft umbrella PR. More changes coming (self-contained Docker image is next; will be pushed onto this branch).

What this does

1. REST-only SCM — drop glab/gh + SCM MCP servers (`refactor(scm)`)

Checkout no longer shells out to glab/gh. Clone/fetch/checkout go through git with the token passed per-invocation via git -c http.extraHeader=... so the token never lands in .git/config (the agent has read access to the worktree).
Inline posting is now REST-only on both providers (the MCP-preferred path + fallback is gone).
Removed the mcp-upstream dispatcher, the [mcp_servers.gitlab] block (~310 lines) and GitLab tool perms from the templates, and denied_tools_regex / GITLAB_DENIED_TOOLS_REGEX.
Kept bubo's own MCP server (bin/bubo mcp, [mcp_servers.bubo]) and the local-git MCP.
Net runtime SCM deps after this change: git + the agent CLI.

2. Standardize LLM config + OpenAI-compat (`feat(config)`)

Collapsed the [agents] auth/model knobs to LLM_API_KEY, LLM_MODEL, LLM_MODEL_EFFORT, plus optional LLM_BASE_URL for OpenAI-compatible endpoints.
Fixes a latent bug: llm_model / reasoning_effort were hardcoded in codex-config.toml (no substitution), so they silently never took effect. bubo init now templates the real values into the Codex profile (and a [model_providers.bubo] block when llm_base_url is set).
Backward-compatible: old keys (reasoning_effort, llm_api_key_env) are still honored with a config_key_deprecated log event.
action.yml reordered to write env.toml before bubo init so templating sees the real values.

3. Anonymous, opt-out usage analytics → PostHog (`feat(analytics)`)

"Help improve Bubo." On by default; numbers only, no identifying content.

New [analytics] block. Opt out via enabled = false, BUBO_ANALYTICS=0, or the cross-tool DO_NOT_TRACK=1.
Emits anonymized events through the OTel logs API directly (no stdlib logging, so bubo's own logs — which carry repo names/paths — can never leak): session_start, review_completed (incl. lines-of-code reviewed), and a per-cycle usage_snapshot built from the existing SQLite aggregate readers.
Privacy enforced structurally by a default-deny allowlist (_ALLOWED_ATTRS): only counts, durations, LoC, tokens, SCM type, and model name leave the machine — never repo/project names, paths, SHAs, finding text, errors, or credentials. The resource is built without env-merge so OTEL_RESOURCE_ATTRIBUTES can't ride along. Anonymous random install id = PostHog distinct_id.
Best-effort throughout (bounded timeouts, all exceptions swallowed); the extra LoC API call is skipped when analytics is off.
⚠️ Not yet verified live: events are shape- and safety-tested, but end-to-end delivery to PostHog (and the distinct_id → identity mapping) has not been confirmed against a real project. That's the gate before relying on the data.

Docs

Operator docs updated for all three changes (prerequisites, recipes, configuration, install-and-configure, github-action).
Note: we intend to replace the MkDocs site going forward; this PR keeps it in sync for now so nothing breaks in the interim.

Verification

ruff ✅ · mypy ✅ · pytest 538 passed (+30 new analytics tests; 4 pre-existing test_project_layout README-link failures present on main, unrelated) · mkdocs build --strict exit 0 · cz check ✅.

Still to come on this branch

build(docker): bundle both agent CLIs (@openai/codex, @anthropic-ai/claude-code) + node + git in the image; env-driven entrypoint.

Checkout now clones over HTTPS with the bot token supplied per git invocation as an auth header (never written to .git/config); posting goes straight through the REST API. Removes the glab/gh CLIs, the upstream GitLab/GitHub MCP servers (and bin/bubo mcp-upstream), the bubo.mcp client, the [mcp_servers.gitlab] profile block + its tool allowlist, and denied_tools_regex. Net SCM dependency is now git + the agent CLI. Bubo's own MCP server and the local-git MCP are unchanged.

Collapse the LLM agent knobs to LLM_API_KEY / LLM_MODEL / LLM_MODEL_EFFORT plus an optional LLM_BASE_URL for custom OpenAI-compatible endpoints, and make them actually take effect: `bubo init` now templates the model, effort, and (when a base_url is set) a [model_providers] block into the Codex profile, and writes the model into the Claude settings. Previously model/effort were hardcoded in the template and silently ignored. The credential-stripping reviewer allowlist gains one deliberate, documented exception: in base_url mode (and only then) LLM_API_KEY + LLM_BASE_URL are passed through to the agent, since a custom endpoint reads the key from the environment at request time. `llm_api_key_env` is deprecated (still honored); `reasoning_effort` is read as a fallback for the new `llm_model_effort`. The GitHub Action writes config before templating the agent profile and supports llm-base-url / llm-model-effort. Auto-running the agent login from LLM_API_KEY at init is deferred to a follow-up (needs validation against the real Codex/Claude CLIs); for now the agent authenticates via its own login.

Reflect the two code changes in the operator docs: - Drop glab/gh and the SCM MCP servers from prerequisites, recipes, and configuration; the only runtime SCM deps are now git + the agent CLI. - Document the standardized LLM surface (LLM_API_KEY, LLM_MODEL, LLM_MODEL_EFFORT) and the optional LLM_BASE_URL for OpenAI-compatible endpoints, including the env-exfiltration tradeoff of base_url mode. - Refresh the GitHub Action inputs and recipes accordingly.

Bubo is free and open source; the only way to learn what to improve is anonymous usage signal from real installs. Add an on-by-default [analytics] block that ships NUMBERS ONLY to PostHog over OTLP logs. - New [analytics] config (on by default). Opt out via `enabled = false`, `BUBO_ANALYTICS=0`, or the cross-tool `DO_NOT_TRACK=1` convention. - src/bubo/analytics.py emits anonymized events directly through the OTel logs API (no stdlib logging, so bubo's own logs can never leak): session_start, review_completed (incl. lines-of-code reviewed), and a per-cycle usage_snapshot derived from the existing SQLite aggregate readers (metrics/outcomes/latency). - Privacy is enforced structurally by a default-deny allowlist: only counts, durations, LoC, tokens, SCM type, and model name leave the machine. Never repo/project names, paths, SHAs, finding text, errors, or credentials. Resource is built without env-merge so OTEL_RESOURCE_* cannot ride along. An anonymous random install id is the PostHog distinct_id so distinct installs can be counted without identifying them. - Best-effort throughout: bounded timeouts, all exceptions swallowed; analytics never slows or breaks a review, and the extra LoC API call is skipped entirely when analytics is disabled. - Adds opentelemetry-exporter-otlp-proto-http; docs + env.example updated.

+    if _logger is None and not _logger_failed:
+        built = _build_pipeline(cfg)
+        if built is None:
+            _logger_failed = True


mountainowl added 4 commits June 22, 2026 19:09

github-advanced-security AI found potential problems Jun 23, 2026

View reviewed changes

Comment thread src/bubo/analytics.py

if _logger is None and not _logger_failed:

built = _build_pipeline(cfg)

if built is None:

_logger_failed = True

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operator simplification: REST-only SCM + standardized LLM_* config#134

Operator simplification: REST-only SCM + standardized LLM_* config#134
mountainowl wants to merge 4 commits into
mainfrom
feat/rest-only-llm-config

mountainowl commented Jun 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mountainowl commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

1. REST-only SCM — drop glab/gh + SCM MCP servers (refactor(scm))

2. Standardize LLM config + OpenAI-compat (feat(config))

3. Anonymous, opt-out usage analytics → PostHog (feat(analytics))

Docs

Verification

Still to come on this branch

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mountainowl commented Jun 23, 2026 •

edited

Loading

1. REST-only SCM — drop glab/gh + SCM MCP servers (`refactor(scm)`)

2. Standardize LLM config + OpenAI-compat (`feat(config)`)

3. Anonymous, opt-out usage analytics → PostHog (`feat(analytics)`)