Skip to content

feat(agent): local LLMs (Ollama) as endpoint providers — no account, no API key (#806)#836

Open
softmarshmallow wants to merge 14 commits into
mainfrom
feat/agent-local-models
Open

feat(agent): local LLMs (Ollama) as endpoint providers — no account, no API key (#806)#836
softmarshmallow wants to merge 14 commits into
mainfrom
feat/agent-local-models

Conversation

@softmarshmallow

@softmarshmallow softmarshmallow commented Jun 12, 2026

Copy link
Copy Markdown
Member

Summary

Implements the v1 cut decided in #806's design-pass handoff: local LLMs (Ollama) as a BYOK-shaped endpoint provider — no account, no API key. Built as "open model registry + custom-endpoint provider" infrastructure; Ollama lands as a preset.

The registry seam (@grida/ai-models)

  • tool_call capability flag, explicit on every catalog entry.
  • models.text.registry: CustomModelSpec (cost optional, conservative defaults — 8k context, tool-calling assumed) + resolve(id, custom) over catalogue ∪ user-registered models.

Endpoint providers (@grida/agent)

  • One generalized OpenAI-compatible provider type {base_url, apiKey?, models[]} (protocol/endpoints.ts), with OLLAMA_ENDPOINT_PRESET (http://localhost:11434/v1, no key). Stays inside the "not a general model-provider router" anti-goal — presets, not config-declared arbitrary providers.
  • EndpointProvidersStore — plain config at ${userData}/endpoints.json, a deliberate sibling of SecretsStore (configs readable; an optional gateway key rides /secrets/* under the endpoint id, never readable back).
  • makeEndpointFactory: every tier maps to the endpoint's default model — titler/compactor tier calls land on a model the endpoint actually serves; a missing key is not an error.
  • resolveProvider: BYOK keys take precedence; configured endpoints (≥1 model) follow; explicit picks of either kind.
  • Closed-world call sites swapped to the registry:
    • ALLOWED_MODEL_IDS gate validates over catalogue ∪ registered ids (still a closed gate).
    • Compaction-limits trap fixed: a session on a registered local model resolves its real window instead of the 1M pro-tier fallback (which meant compaction never fired and long sessions died on context overflow). Tier-only endpoint sessions substitute the endpoint default model; the summarizer input cap follows the endpoint model.
    • AgentModelId / provider_id / ChatModel.provider_id opened on the wire (string & {}), runtime-gated.
  • /providers/endpoints/* CRUD routes behind the same CORS/Referer/Basic-Auth stack; AgentTransport.Client.providers.

Desktop + renderer

  • Optional providers bridge namespace (contract + preload), feature-detected so old binaries degrade.
  • Settings → Local Models card: Ollama preset slot (base URL, registered models with context window + tool-call flag), explicit expectation copy ("local models vary; ~30B+ recommended") — no parity framing.
  • Model picker lists registered models grouped by endpoint; seeding accepts registered ids (incl. late endpoint loads). Multimodal gate and context meter resolve via the registry, so a local model's real (small) window drives the meter.
  • ModelToolCallNotice: visible degraded-mode warning when the selected model is marked tool_call: false (permissive default — warn, don't block; no ToolLoopAgent restructuring in v1, per the handoff).

Security (GRIDA-SEC-004 review done)

  • All five enforcement layers verified intact; SECURITY.md gained an "Endpoint providers" paragraph documenting the config/secret split and egress posture.
  • Notable: under the srt sandbox, allowLocalBinding's local-ip rule permits loopback egress (how ollama serve is reached) while a hostile remote base_url stays blocked by the enumerated domain allowlist — a malicious config cannot turn the sidecar into an open exfil channel.

Live verification (real Ollama, no key)

endpoints.live.test.ts (gated GRIDA_LIVE_OLLAMA=1, CI-skipped) — all 5 pass against a local gemma4:31b-mlx with no BYOK secret configured:

  1. provider resolution with no secret,
  2. keyless text turn end-to-end (SSE stream, session persisted, usage recorded, background titler renames the session via the same endpoint factory),
  3. multi-turn session continuity (server-authoritative model view),
  4. manual compaction producing a content-bearing summary,
  5. a real server-side tool execution — the model calls write_file and the file lands in the workspace.

Live testing caught two calibration bugs, fixed in this PR:

  • Streaming usage was silently zero for every createOpenAICompatible stream (incl. the existing OpenRouter BYOK path): the OpenAI-compat spec only emits the usage chunk when stream_options.include_usage is set. Both factories now set includeUsage: true.
  • Thinking models starved the titler/compactor: gemma's think stream consumed the titler's 32-token cap before any text landed (finish_reason: length, empty content) — sessions stayed "New Chat". Caps raised to 512/2048 (pure ceilings; non-thinking models stop early), titler timeout raised to 60s for single-flight local servers.

Tests

  • @grida/agent: 492 passed (29 new — validator/store/routes, resolver cascade + tier mapping, run-input gates, compaction limits, public-API pins).
  • @grida/ai-models: 33 passed (9 new registry tests).
  • Editor desktop bridge tests: 9 passed (boundary pin extended with /providers/endpoints/).
  • @grida/agent + desktop + editor typecheck clean (editor's pre-existing canvas-wasm/LFS-asset errors unrelated).

User docs

New Desktop docs section: docs/editor/desktop/local-models.md — requirements with honest expectation framing, settings walkthrough, picker usage, tool-support detection, troubleshooting, and the overrides escape hatch. One screenshot (WebP), deliberately: the configured Local Models card with detection badges — captured from the real /desktop/* surface running against a live agent daemon + Ollama (via the web-daemon dev boot, which also gained the providers bridge namespace so the prod pages are exercisable in a plain browser). Docs images grow monotonically with the site; we keep only the shot that carries the feature and is least likely to churn.

Detection is authoritative; humans get an explicit override slot

No custom input over discoverable truth: the capability fields on a stored model entry are detection-owned — refreshed on every settings visit (converging to /api/ps truth once a model loads) and rendered as read-only badges. Editable inputs appear ONLY where detection has nothing (manual adds, ids-only gateways). For "the endpoint reports a wrong value," entries carry a sticky overrides slot (override → detected → registry default) that detection never writes — hand-edited in endpoints.json, which the settings card links to directly (local-LLM users are developers). Host and renderer resolve through one shared helper so every registry consumer sees effective values.

Model auto-detection

"Set up Ollama" prefills the installed models immediately, and a Detect button re-scans after new pulls (also backfilling missing fields on already-registered models, never clobbering user-set values) — no typing model ids. POST /providers/endpoints/probe is a host-side fetch (the packaged renderer's grida.co origin can't reach a local Ollama through CORS) of Ollama's /api/tags (ids + capability tags, so tool_call comes back real), enriched with the context window/api/ps first (a loaded model's context_length is the server's actual allocation; 262144 for gemma4:31b-mlx on Ollama 0.30.7), then /api/show model_info (model max) — and falling back to the generic OpenAI /models shape (LiteLLM, vLLM). Responses are parsed and reduced to {id, tool_call, contextWindow} rows — never proxied raw; reads are bounded (timeout + size cap). The context field stays editable: a server explicitly capped below a model's max (OLLAMA_CONTEXT_LENGTH) only reports the cap once the model is loaded.

Cleanup pass

A 4-angle review (reuse / simplification / efficiency / altitude) was applied at the end: the detection-merge contract moved into protocol/endpoints.ts as a tested pure mergeProbedModels (was inline UI code), the default-model rule and the provider-id namespace gate each got one shared home, the probe's size cap is now enforced on the wire and its two shapes probe concurrently, and the per-token chat render paths memoize registry resolution.

Out of scope (deliberate)

Closes #806.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Local models via Ollama and OpenAI‑compatible endpoints in Grida Desktop; register, probe, and manage endpoints and per-model overrides; endpoint models appear in the model picker, context meters, and are used when sending (provider pinned).
  • Documentation
    • Desktop guide for running local models and advanced endpoint configuration.
  • Security
    • Docs updated: endpoint config storage, secret handling, validation, probing limits, and sandbox/network rules.

softmarshmallow and others added 11 commits June 12, 2026 15:36
Adds an explicit tool_call flag to every text-model spec and a
models.text.registry namespace: CustomModelSpec (cost optional,
conservative 8k-context defaults) with resolve() over catalogue ∪
user-registered models. The seam for local/endpoint models (#806).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…806)

Generalizes the provider layer beyond the two BYOK key slots: one
endpoint-provider type {base_url, optional key, registered models[]}
with Ollama as the preset — a free user runs the agent end-to-end with
no account and no API key.

- protocol/endpoints: client-safe config contract + validator (http(s)
  URL, bounded sizes, no cost/credentials in config) + Ollama preset
- providers/endpoints: file-backed EndpointProvidersStore
  (endpoints.json, atomic write, load-time re-validation)
- makeEndpointFactory: every tier maps to the endpoint's default model
  (titler/compactor land on a served model); explicit ids pass through;
  a missing key is not an error
- resolveProvider: BYOK keys first, then configured endpoints; explicit
  picks of either kind
- run-input gate: model_id/provider_id validate over catalogue ∪
  registered ids — still a closed gate, now registry-aware
- compaction: injectable limits resolver; a tier-only endpoint session
  resolves the endpoint default model's real window instead of the
  1M pro-tier fallback (overflow killed long local sessions), and the
  summarizer input cap follows the endpoint's model
- /providers/endpoints/* CRUD routes (same auth stack) + transport
  client; secrets routes admit configured endpoint ids for keyed
  gateways (still no read-back, GRIDA-SEC-004)
- SECURITY.md: GRIDA-SEC-004 endpoint-provider note — config/secret
  split, user-owned egress, srt localhost-vs-remote bounding

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ates (#806)

Threads endpoint providers through the renderer chain:

- desktop bridge contract + preload: optional providers namespace
  (list/set/delete endpoint configs; feature-detected by the renderer)
- settings: Local Models card — Ollama preset slot with base URL,
  registered models (context window, tool-call flag), explicit
  expectation copy (no parity framing; ~30B+ recommended)
- model picker: registered models join the list grouped by endpoint;
  seeding accepts registered ids (incl. late endpoint loads)
- multimodal gate + context meter resolve via the open registry, so a
  local model's real (small) window drives the meter
- ModelToolCallNotice: visible degraded-mode warning when the selected
  model is marked tool_call: false (permissive default — warn, not block)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
)

Live verification against a real local Ollama (gemma4:31b-mlx) caught
two calibration bugs:

- OpenAI-compatible streams omit the usage chunk unless
  stream_options.include_usage is requested — every streamed run through
  createOpenAICompatible recorded zero tokens (no rollups, no context
  meter). Set includeUsage on both the OpenRouter and endpoint factories.
- The titler's 32-token output cap (and the compactor's 1024) are
  consumed entirely by the think stream on reasoning models
  (finish_reason: length, empty content) — sessions stayed 'New Chat'
  and compaction could produce an empty summary. Raised to 512/2048
  (ceilings — non-thinking models stop early) and gave the
  fire-and-forget titler a 60s ceiling so a single-flight local server
  that queues it behind the main turn doesn't starve it.

Adds endpoints.live.test.ts (GRIDA_LIVE_OLLAMA=1, CI-skipped): keyless
provider resolution, real text turn + usage + titler, multi-turn
session continuity, manual compaction with summary content, and a real
server-side write_file tool execution — all 5 pass against gemma4.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…806)

The send body carried only model_id, so with a BYOK key configured the
provider cascade resolved OpenRouter and passed a local model id
(llama3.1:8b) upstream — an unservable model. A registered-model pick
now rides provider_id (derived from the endpoint list); catalog picks
keep the cascade.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
User-facing page under a new Desktop docs section: requirements +
expectation framing, settings walkthrough, picker usage, the tools
toggle, and troubleshooting. Screenshots captured from the real
/desktop/* surfaces running against a live agent daemon and Ollama.

Also adds the providers namespace to the dev-only web daemon bridge so
the prod /desktop pages exercise endpoint config in a plain browser —
the same path used to capture these screenshots.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Typing model ids by hand was the worst part of the setup flow, and the
data is discoverable: Ollama's /api/tags reports installed models WITH
capability tags. Adds POST /providers/endpoints/probe — a host-side
fetch (the packaged renderer's grida.co origin can't reach a local
Ollama through CORS) that parses Ollama's native listing, falls back to
the generic OpenAI /models shape (LiteLLM, vLLM), and returns reduced
{id, tool_call} rows; raw bodies are never proxied, reads are bounded.

Deliberately NOT probed: the context window — Ollama reports a model's
architectural max (262k for gemma4), not the window the server actually
serves (often 8k); auto-filling the max would make compaction fire too
late and kill long sessions. That field stays user-set with the safe
default.

Settings: 'Set up Ollama' now prefills the installed models immediately;
a Detect button re-scans after new pulls. Wired through the transport,
desktop bridge, preload, and web daemon bridge; doc page + screenshots
refreshed from the live flow. Live test probes the real Ollama and finds
the test model with tool_call: true.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Modern Ollama exposes the served context via REST: /api/ps reports a
LOADED model's actual allocation (262144 for gemma4:31b-mlx on 0.30.7),
and /api/show model_info carries the model maximum. The probe now fills
contextWindow — ps (authoritative) over show (max) over unset — which
retires the earlier 'cannot be detected' caveat for current servers; the
field stays editable for explicitly capped setups, since the show
maximum can overshoot a cap until the model is first loaded.

Settings Detect also backfills missing fields on already-registered
models (never clobbering user-set values). Doc + screenshot refreshed —
the live flow detects gemma4 with ctx 262144 filled in.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
)

No custom input over discoverable truth: a hand-typed snapshot of a
value the endpoint reports only rots. The capability fields on a stored
model entry (tool_call, contextWindow) are now detection-owned — the
probe refreshes them on every settings visit (which also converges to
/api/ps truth once a model gets loaded) and the rows render them as
read-only badges. Inputs remain ONLY where detection has nothing
(manual adds, ids-only gateways) — there the human is the data source.

For the 'endpoint reports a wrong value' case, entries gain a sticky
overrides slot (override → detected → registry default). Detection
never writes it; it's hand-edited in endpoints.json — the settings card
now links straight to the file (/providers/endpoints/info + native
reveal), since local LLM users are developers. Both the host's
registeredModels() and the renderer resolve through one shared helper,
so every registry consumer sees effective values.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
#806)

One WebP (the configured card with detection badges) instead of three —
docs images grow forever, keep only what carries the feature. Content
updated for detection-owned fields: 'tools toggle' section replaced with
detection framing, troubleshooting now points at the overrides escape
hatch instead of an editable context field that no longer exists.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…elpers (#806)

/simplify pass over the branch, behavior-preserving:

- protocol/endpoints: mergeProbedModels gives the detection-owned merge
  contract an executable, tested home (was inline in the settings page);
  endpointDefaultModelId is THE default-model rule (was derived in both
  the provider factory and the runtime limits resolver, which must
  agree); parseEndpointBaseUrl shared by validator + probe; overrides
  validation no longer checks outputLimit (overrides can't carry it);
  the id pattern is module-private again.
- provider-ids: isByokProviderId hoisted (was re-written 4x) and a named
  ProviderId type replaces the ad-hoc 'ByokProviderId | (string & {})'
  unions across run options, session rows, and the desktop bridge.
- providers/endpoints: isKnownProviderId — the one provider-id namespace
  gate shared by the /secrets allowlist and the run-input provider gate.
- probe: getJson/postJson collapsed into one requestJson; the body size
  cap is now enforced on the wire (content-length + capped stream read,
  not buffer-then-measure); Ollama and generic probes fire concurrently
  so non-Ollama gateways don't wait out the Ollama timeout.
- runtime: limitsResolver no longer double-reads the store (custom specs
  derived from the one configs snapshot), and summarizerInputCap is a
  sync function of that snapshot instead of a second store roundtrip.
- editor: settings page delegates to mergeProbedModels, dead Save-button
  condition and dead badge fallbacks dropped; revealConfigFile reuses
  canRevealConfigFile; ModelToolCallNotice and the context meter memoize
  registry resolution (they render inside per-token re-render paths).

GRIDA-SEC-004 walked: allowlists keep the same closed sets, the probe
read bound is strictly stronger, no tag dropped.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@vercel

vercel Bot commented Jun 12, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
grida Ready Ready Preview, Comment Jun 12, 2026 10:34am
6 Skipped Deployments
Project Deployment Actions Updated (UTC)
code Ignored Ignored Jun 12, 2026 10:34am
docs Ignored Ignored Preview Jun 12, 2026 10:34am
legacy Ignored Ignored Jun 12, 2026 10:34am
backgrounds Skipped Skipped Jun 12, 2026 10:34am
blog Skipped Skipped Jun 12, 2026 10:34am
viewer Skipped Skipped Jun 12, 2026 10:34am

Request Review

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

Adds configurable local/OpenAI-compatible endpoint providers (store, probe, validation), expands provider resolution and runtime, exposes provider APIs via transport/desktop bridge, integrates desktop settings and model-picker, documents security constraints, and adds comprehensive tests (unit, integration, live-gated).

Changes

Endpoint Provider Support

Layer / File(s) Summary
Endpoint Provider Protocol & Types
packages/grida-ai-models/src/models.ts, packages/grida-ai-agent/src/protocol/endpoints.ts, packages/grida-ai-agent/src/protocol/provider-ids.ts, packages/grida-ai-agent/src/protocol/handshake.ts, packages/grida-ai-agent/src/index.ts
Adds tool_call to ModelSpec, introduces models.text.registry for custom specs, defines EndpointProviderConfig/EndpointModelEntry, OLLAMA preset, ProviderId type and isByokProviderId guard, and advertises providers@1 capability.
Model Probing & Endpoint Discovery
packages/grida-ai-agent/src/providers/probe.ts, packages/grida-ai-agent/src/providers/probe.test.ts
Implements bounded probeEndpointModels with Ollama and OpenAI-compatible discovery, safe JSON/size/time bounds, context-window enrichment, and tests for success/fallback/failure cases.
Endpoint Storage & Server Routes
packages/grida-ai-agent/src/providers/endpoints.ts, packages/grida-ai-agent/src/http/routes/providers.ts, packages/grida-ai-agent/src/http/routes/secrets.ts, packages/grida-ai-agent/src/http/server.ts
Adds EndpointProvidersStore persisting ${userData}/endpoints.json with atomic writes and validation, implements /providers/endpoints/* list/info/set/delete/probe routes, extends /secrets/* gating to include configured endpoint ids, and wires store into server/runtime.
Provider Resolver & Runtime Integration
packages/grida-ai-agent/src/providers/index.ts, packages/grida-ai-agent/src/providers/byok.ts, packages/grida-ai-agent/src/runtime/index.ts, packages/grida-ai-agent/src/runtime/run-input.ts, packages/grida-ai-agent/src/session/compaction.ts, packages/grida-ai-agent/src/session/rows.ts, packages/grida-ai-agent/src/session/titler.ts, packages/grida-ai-agent/src/session/compactor.ts
Expands resolver to BYOK+endpoint kinds with maybeResolveEndpoint, adds makeEndpointFactory, updates run-input and session types to accept provider/model ids from the open registry, makes compaction endpoint-aware, and adjusts compactor/titler limits/timeouts.
Transport Client & Desktop Bridge
packages/grida-ai-agent/src/transport.ts, packages/grida-desktop-bridge/src/index.ts, editor/lib/agent-chat/web-daemon-bridge.ts, editor/lib/desktop/bridge.ts, desktop/src/preload.ts, editor/lib/desktop/bridge-boundary.test.ts
Extends AgentTransport.Client with providers API, updates DesktopBridge types to accept ProviderId, adds optional providers surface, wires preload/web-daemon bridge methods, and updates bridge boundary tests.
Desktop UI for Endpoint Configuration
editor/app/desktop/settings/page.tsx
Adds LocalModelsSection with detection, save/delete, per-model overrides (contextWindow, tool_call), state machine (loading/ready/saving/error), detect/probe flow, and endpoints.json reveal.
Model Picker & Context-Aware UI
editor/scaffolds/desktop/shared/registered-models.ts, editor/scaffolds/desktop/shared/model-picker.tsx, editor/scaffolds/desktop/shared/context-meter.tsx, editor/app/desktop/welcome/page.tsx, editor/scaffolds/desktop/ai-sidebar/chat.tsx, editor/scaffolds/desktop/workbench/agent-pane.tsx, editor/lib/agent-chat/build-agent-send.ts, editor/lib/agent-chat/build-agent-send.test.ts
Adds registered_models utilities and useEndpointProviders hook, renders endpoint-registered models in picker, shows ModelToolCallNotice for non-tool-capable models, uses endpoint-aware context windows, and pins provider_id into outgoing agent send/approval resume payloads.
Documentation & Security
SECURITY.md, docs/editor/desktop/local-models.md, docs/editor/desktop/_category_.json, packages/grida-ai-agent/README.md, packages/grida-ai-models/README.md
Documents security model (GRIDA-SEC-004), endpoints.json storage semantics vs secrets for API keys, probe behavior and sandboxing, adds Desktop local-models guide, updates package READMEs and desktop docs category.
Testing & Contracts
packages/grida-ai-agent/src/__public-api__.test.ts, packages/grida-ai-agent/src/providers/endpoints.test.ts, packages/grida-ai-agent/src/providers/endpoints.live.test.ts, packages/grida-ai-agent/src/providers/index.test.ts, packages/grida-ai-agent/src/runtime/run-input.test.ts, packages/grida-ai-models/__tests__/registry.test.ts
Adds public API guard, comprehensive unit/integration tests for endpoint validation/store/routes/probe/merge semantics, runtime gating tests, and a gated live end-to-end suite exercising a real Ollama instance.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

  • #806: Implements the local LLM (Ollama) support, provider registry, probe, UI, and runtime changes described by the issue.

Possibly related PRs

  • gridaco/grida#792: Related desktop model-selection/context-meter flow; overlaps in context-meter changes.
  • gridaco/grida#800: Related approval-resume payload changes; both modify resume payload handling for provider pinning.

Suggested labels

ai, enhancement, documentation

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/agent-local-models

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2730188d8d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/grida-ai-agent/src/http/routes/providers.ts
Comment thread editor/lib/agent-chat/build-agent-send.ts
… approval resume (#806)

Two Codex review findings on #836, both real:

- /providers/endpoints/delete now also deletes the key stored under the
  endpoint's id. Without this the key was orphaned in auth.json — the
  /secrets/* allowlist only accepts CONFIGURED endpoint ids, so the
  leftover was undeletable, and re-creating the same endpoint id would
  silently reuse the stale credential. Both deletes are idempotent, so a
  partial failure is retry-safe. (Secrets stay write/presence/delete-only
  — GRIDA-SEC-004 unchanged.)

- buildApprovalResumeBody carries provider_id, threaded from the same
  pin the normal send path uses. A resume re-enters /agent/run; without
  the pin, a local-model session answering a supervised approval would
  cascade BYOK-first and ship a registered model id to a provider that
  cannot serve it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@vercel vercel Bot temporarily deployed to Preview – backgrounds June 12, 2026 10:08 Inactive
@vercel vercel Bot temporarily deployed to Preview – blog June 12, 2026 10:08 Inactive
@vercel vercel Bot temporarily deployed to Preview – viewer June 12, 2026 10:08 Inactive

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@editor/app/desktop/settings/page.tsx`:
- Around line 452-617: Add an API key field to the "Local Models" card so
endpoint-scoped secrets can be set/cleared: in the component that renders the
Local Models Card (the JSX block referencing draft/base_url, models,
LocalModelRow and using edit(), handleSave(), handleRemove(), detectInto()), add
a labeled input bound to draft.api_key (or draft.apiKey if your model uses
camelCase) with a masked value (type="password"), a placeholder like "endpoint
API key", and onChange calling edit({ ...draft, api_key: e.target.value }). Also
add a small "Reveal/Clear" control (toggle to show plain text and a button to
clear the value) wired to edit() so users can delete the key; ensure the
existing save/remove flows (handleSave/handleRemove) persist and clear the
api_key field appropriately and that any validation that computes saveDisabled
includes the api_key presence when relevant.

In `@editor/lib/desktop/bridge.ts`:
- Around line 325-334: The secrets helpers exported from bridge.ts
(secrets.hasKey, secrets.setKey, secrets.deleteKey) currently accept only
ByokProviderId, leaving API-keyed endpoints unsupported; update their signatures
and exports to accept the union type used for endpoint IDs (e.g., ByokProviderId
| EndpointId or the shared ProviderId type used by the provider config) and
export these updated helpers alongside the provider API so the renderer can call
secrets.hasKey/setKey/deleteKey with endpoint ids for keyed gateways; ensure
internal calls and type guards still handle both cases and update any related
type imports/exports to keep typings consistent.

In `@editor/scaffolds/desktop/shared/model-picker.tsx`:
- Around line 48-58: The picker currently emits only m.id which loses endpoint
identity when two endpoints register the same model; update DesktopModelPicker
(and its internal handler used around the options rendering at the block
referenced 74-88) to carry a provider-qualified selection instead of a bare
model id: change the prop types (value: string and
onValueChange(modelId:string)) to either a compound string like
`${providerId}:${modelId}` or a small object { providerId, modelId }, update the
option value generation where endpoint-specific rows are rendered to include the
provider id, and update the onChange/onValueChange invocation to emit the
provider-qualified value so downstream consumers (ModelToolCallNotice, session
reseeding, handoff/run wiring) can unambiguously resolve which endpoint was
selected.

In `@editor/scaffolds/desktop/shared/registered-models.ts`:
- Around line 45-51: providerIdForModel currently finds the endpoint by matching
modelId, which breaks when multiple providers expose the same model id; instead,
include the chosen provider identity with the renderer state and/or namespace
picker values so the UI sends the explicit provider id. Update the model picker
components in chat.tsx and agent-pane.tsx to store a composite value (e.g.
{providerId, modelId} or namespaced string) and adjust any consumers that call
providerIdForModel to use the stored providerId directly; remove or deprecate
providerIdForModel usages where the provider should be read from state rather
than inferred from modelId.

In `@packages/grida-ai-agent/src/http/routes/providers.ts`:
- Around line 64-70: The current catch for endpoints.set(result.config) always
returns HTTP 400, which mislabels filesystem/write failures (e.g., from
atomicWrite) as client errors; update the catch to distinguish
validation/semantic errors from persistence failures by checking the error type
or properties (e.g., instanceof ValidationError or an error.code like
ENOSPC/EACCES or a custom error thrown by endpoints.set/atomicWrite) and return
c.json({ error: err.message }, 400) for validation-related errors but return a
500 (or c.status(500).json...) with the error message for filesystem/persistence
errors; adjust the handler around endpoints.set(...) to perform this type-based
branching so callers receive correct 4xx vs 5xx responses.
- Around line 90-96: The handler for POST /providers/endpoints/probe maps all
probe() failures to 502; change the response mapping so client-side validation
errors return 400 while upstream probe failures remain 502: update the route
handler (the async callback for app.post("/providers/endpoints/probe")) to
inspect the probe() result (e.g., a returned status/code or a validation flag on
the result object) and if it indicates a malformed input/validation error
respond with c.json({ error: result.error }, 400), otherwise keep c.json({
error: result.error }, 502); if probe() currently doesn't differentiate, add a
validation indicator (e.g., result.status === 400 or result.validation === true)
in probe()/probeEndpointModels() and use that in the handler.

In `@packages/grida-ai-agent/src/protocol/endpoints.ts`:
- Around line 215-231: The parseEndpointBaseUrl function accepts and returns the
raw string without trimming, so whitespace-padded values can pass URL parsing
but be persisted with spaces; update parseEndpointBaseUrl to trim the input
first (e.g., const s = String(raw).trim()), validate the trimmed value against
MAX_BASE_URL_LEN and emptiness, construct the URL from the trimmed string, and
return base_url: trimmedValue and url; keep existing error branches but ensure
all checks use the trimmed value rather than raw.

In `@packages/grida-ai-agent/src/providers/endpoints.live.test.ts`:
- Around line 148-152: The afterEach teardown should guard against partial setup
failures by making cleanup conditional: check that host is defined before
calling host.runtime.dispose() and host.store.close(), and check that
host.runtime and host.store exist as needed; always await fs.rm(baseDir, {
recursive: true, force: true }) regardless of host to ensure filesystem cleanup.
Update the afterEach block (referencing afterEach, host, host.runtime.dispose,
host.store.close, baseDir) to perform null/undefined checks around host and its
properties so teardown doesn't throw when beforeEach fails.

In `@packages/grida-ai-agent/src/providers/endpoints.ts`:
- Around line 62-118: The class races because ensureLoaded sets loaded
prematurely and reads/writes to this.entries aren't serialized; add an async
mutex (per-instance lock) and use it to serialize load and write operations:
change ensureLoaded to acquire the lock, perform fs.readFile/JSON.parse and
validation, set this.loaded = true only after successful load, then release the
lock; have persist(), set(), delete() (and any other mutators) acquire the same
lock for their critical sections so they read/update this.entries and call
atomicWrite under the lock; keep list()/get() awaiting ensureLoaded but avoid
flipping loaded before loading completes by tying ensureLoaded to the lock (or a
loadPromise) so concurrent callers wait for the single completed load. Ensure
you reference and protect symbols: ensureLoaded, loaded, entries, persist, set,
delete (if present), file_path, atomicWrite, MAX_ENTRIES.

In `@packages/grida-ai-agent/src/providers/probe.ts`:
- Around line 66-83: The code currently awaits ollamaProbe before considering
openaiProbe, which can delay the fast OpenAI fallback; change the probe flow in
the probe function so both promises (ollamaProbe and openaiProbe created via
requestJson) are polled concurrently and the code proceeds as soon as
openaiProbe resolves if ollamaProbe is still pending — for example, await
whichever probe resolves first or check openaiProbe result when available
instead of blocking on the ollama await; keep using parseOllamaTags and
enrichContextWindows when ollamaProbe resolves successfully (use the ollama
variable only when its promise finished and ok), otherwise return the openai
result immediately when openaiProbe completes successfully.

In `@packages/grida-ai-agent/src/runtime/index.ts`:
- Around line 530-531: The current cap calculation uses Math.max(1_024, window -
4_096) which can exceed the resolved model window for tiny/custom models; change
the return so the summarizer cap is clamped to the resolved context_window by
returning the smaller of the window and the computed cap (e.g. use
Math.min(window, Math.max(1_024, window - 4_096))). Update the expression around
limits.resolve({ provider_id: providerId }).context_window and the return so the
summarizer never receives more history than the model's context_window.
- Around line 501-509: The current ResolveModelLimits logic only substitutes
endpointDefaultModelId when model.model_id is absent, but if a persisted
model_id no longer exists on the endpoint it should be treated as missing;
update the block in ResolveModelLimits so that after finding endpoint =
configs.find(e => e.id === model.provider_id) you also check whether the
endpoint actually exposes the persisted model_id (e.g. via whatever list/field
the endpoint exposes) and if it does not, set effective = { ...model, model_id:
defaultId } just like the missing case, then call resolveModelLimits(effective,
custom); use the existing symbols ResolveModelLimits, resolveModelLimits,
endpointDefaultModelId, configs, provider_id and model_id to locate and
implement this conditional.

In `@packages/grida-ai-agent/src/runtime/run-input.ts`:
- Around line 115-126: The request boundary currently accepts any known provider
id via isKnownProviderId even if the endpoint is a placeholder (models: []);
change the check that gates explicit provider selections to use the runtime's
"resolvable" rule instead of the simple existence check—call the existing
resolvability helper (e.g., isProviderResolvable or the runtime function that
verifies an endpoint can be resolved for a run) with providerId and
deps.endpoints and only allow when that returns true; update the branch that
sets explicit to providerId accordingly and adjust the test to expect rejection
for placeholder/empty-model configs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3744199a-4ae2-4b0a-91dc-1bf1e07ed358

📥 Commits

Reviewing files that changed from the base of the PR and between 0552963 and 2730188.

📒 Files selected for processing (50)
  • SECURITY.md
  • desktop/src/preload.ts
  • docs/editor/desktop/_category_.json
  • docs/editor/desktop/img/local-models-configured.webp
  • docs/editor/desktop/local-models.md
  • editor/app/desktop/settings/page.tsx
  • editor/app/desktop/welcome/page.tsx
  • editor/lib/agent-chat/build-agent-send.test.ts
  • editor/lib/agent-chat/build-agent-send.ts
  • editor/lib/agent-chat/web-daemon-bridge.ts
  • editor/lib/desktop/bridge-boundary.test.ts
  • editor/lib/desktop/bridge.ts
  • editor/scaffolds/desktop/ai-sidebar/chat.tsx
  • editor/scaffolds/desktop/shared/context-meter.tsx
  • editor/scaffolds/desktop/shared/model-picker.tsx
  • editor/scaffolds/desktop/shared/registered-models.ts
  • editor/scaffolds/desktop/workbench/agent-pane.tsx
  • packages/grida-ai-agent/README.md
  • packages/grida-ai-agent/src/__public-api__.test.ts
  • packages/grida-ai-agent/src/http/routes/handshake.ts
  • packages/grida-ai-agent/src/http/routes/providers.ts
  • packages/grida-ai-agent/src/http/routes/secrets.ts
  • packages/grida-ai-agent/src/http/server.ts
  • packages/grida-ai-agent/src/index.ts
  • packages/grida-ai-agent/src/neutral-globals.d.ts
  • packages/grida-ai-agent/src/protocol/endpoints.ts
  • packages/grida-ai-agent/src/protocol/handshake.ts
  • packages/grida-ai-agent/src/protocol/provider-ids.ts
  • packages/grida-ai-agent/src/protocol/run.ts
  • packages/grida-ai-agent/src/providers/byok.ts
  • packages/grida-ai-agent/src/providers/endpoints.live.test.ts
  • packages/grida-ai-agent/src/providers/endpoints.test.ts
  • packages/grida-ai-agent/src/providers/endpoints.ts
  • packages/grida-ai-agent/src/providers/index.test.ts
  • packages/grida-ai-agent/src/providers/index.ts
  • packages/grida-ai-agent/src/providers/probe.test.ts
  • packages/grida-ai-agent/src/providers/probe.ts
  • packages/grida-ai-agent/src/runtime/index.ts
  • packages/grida-ai-agent/src/runtime/run-input.test.ts
  • packages/grida-ai-agent/src/runtime/run-input.ts
  • packages/grida-ai-agent/src/session/compaction.test.ts
  • packages/grida-ai-agent/src/session/compaction.ts
  • packages/grida-ai-agent/src/session/compactor.ts
  • packages/grida-ai-agent/src/session/rows.ts
  • packages/grida-ai-agent/src/session/titler.ts
  • packages/grida-ai-agent/src/transport.ts
  • packages/grida-ai-models/README.md
  • packages/grida-ai-models/__tests__/registry.test.ts
  • packages/grida-ai-models/src/models.ts
  • packages/grida-desktop-bridge/src/index.ts

Comment thread editor/app/desktop/settings/page.tsx
Comment thread editor/lib/desktop/bridge.ts
Comment thread editor/scaffolds/desktop/shared/model-picker.tsx
Comment thread editor/scaffolds/desktop/shared/registered-models.ts
Comment thread packages/grida-ai-agent/src/http/routes/providers.ts Outdated
Comment thread packages/grida-ai-agent/src/providers/endpoints.ts Outdated
Comment thread packages/grida-ai-agent/src/providers/probe.ts
Comment thread packages/grida-ai-agent/src/runtime/index.ts
Comment thread packages/grida-ai-agent/src/runtime/index.ts Outdated
Comment thread packages/grida-ai-agent/src/runtime/run-input.ts
…te status codes (#806)

Nine of thirteen findings applied (four declined on the thread, with
reasons — single-endpoint v1 makes provider-qualified picker values
unconstructible; probe preference is deliberate fidelity; run gate
stays membership-only since the resolver already fails closed):

- settings: the Local Models card gains an API key slot (EndpointKeyRow)
  for keyed gateways — same write/presence/delete-only secrets surface
  as BYOK, under the endpoint's id, rendered once the endpoint is
  persisted (the allowlist accepts configured ids only). Makes the doc's
  'save the key under Settings' sentence actually true.
- bridge: secrets.hasKey/setKey/deleteKey/confirmDeleteKey widened to
  ProviderId (endpoint ids were already valid on the host; the renderer
  types just couldn't say so).
- store: one shared load promise (a concurrent first read can no longer
  observe the empty cache mid-load) + a write chain so concurrent
  set()/delete() can't persist stale snapshots over each other.
- routes: store-level persistence failures are 500s, not 400s; a
  malformed probe base_url is a 400, not a 502 'outage'.
- protocol: parseEndpointBaseUrl trims padding (it would survive URL
  parsing yet break the string-concatenated request base).
- runtime: a STALE persisted local model_id (removed from the config)
  now falls back to the endpoint default for limits — same trap as the
  missing-id case; summarizer input cap clamped to the model window.
- live test teardown guards against partial setup failures.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
editor/app/desktop/settings/page.tsx (1)

331-349: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Invalidate stale probe/save completions before writing them back.

refresh() always starts detectInto(ollama, { persist: true }), but the form stays editable and handleRemove() can run before that promise settles. A late completion still calls providers.setEndpoint(next) / setState(...) with the stale base/draft snapshot, which can recreate a just-deleted endpoint or wipe newer unsaved edits. Please gate these async writes behind a request version/cancel check (or merge into the latest draft) and freeze editing while one is in flight.

Possible direction
-import { useCallback, useEffect, useState } from "react";
+import { useCallback, useEffect, useRef, useState } from "react";
...
+  const draftVersionRef = useRef(0);
+
   const edit = useCallback((next: EndpointProviderConfig) => {
+    draftVersionRef.current += 1;
     setState({ kind: "ready", draft: next, dirty: true });
   }, []);
...
   const detectInto = useCallback(
     async (base: EndpointProviderConfig, opts: { persist: boolean }) => {
+      const version = draftVersionRef.current;
       setProbing(true);
       setProbeNote(null);
       try {
         const result = await providers.probeEndpoint(base.base_url);
         const merged = mergeProbedModels(base.models, result.models);
...
         const next = { ...base, models: merged.models };
+        if (draftVersionRef.current !== version) return;
         if (opts.persist) {
           await providers.setEndpoint(next);
+          if (draftVersionRef.current !== version) return;
           setState({ kind: "ready", draft: next, dirty: false });
         } else {
           setState({ kind: "ready", draft: next, dirty: true });
         }
...

Also invalidate the version before handleSave() / handleRemove() kick off, and disable the editable inputs while probing or state.kind === "saving".

Also applies to: 364-377, 393-401, 419-435, 488-569

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@editor/app/desktop/settings/page.tsx` around lines 331 - 349, detectInto is
writing back a stale snapshot when concurrent edits/removals occur; gate its
async writes by tracking a request version or cancellation token and/or merging
into the latest draft before calling providers.setEndpoint or setState.
Specifically, when refresh() calls detectInto(base,...), capture an incrementing
requestId (or AbortSignal) and check it right before performing
providers.setEndpoint(next) and setState(...) so late completions are ignored;
likewise invalidate/advance that requestId when handleSave() or handleRemove()
start to prevent stale writes, and disable editing inputs while probing or
state.kind === "saving" (and/or while a probe request is in flight) so the UI
cannot produce conflicting edits.
♻️ Duplicate comments (1)
packages/grida-ai-agent/src/runtime/index.ts (1)

503-514: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scope stale-model detection to the selected endpoint.

Line 511 checks custom.some(...) across all endpoint models. If two endpoints share a model id, a stale model_id for this provider can still look “known,” skipping default substitution and resolving limits from the wrong window.

🛠️ Proposed fix
-        const known =
-          !!model.model_id && custom.some((m) => m.id === model.model_id);
-        if (defaultId && (!model.model_id || !known)) {
+        const knownOnEndpoint =
+          !!model.model_id &&
+          !!endpoint?.models.some((m) => m.id === model.model_id);
+        if (defaultId && (!model.model_id || !knownOnEndpoint)) {
           effective = { ...model, model_id: defaultId };
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/grida-ai-agent/src/runtime/index.ts` around lines 503 - 514, The
stale-model detection currently uses custom.some(...) across all endpoints;
narrow it to the selected endpoint by replacing the global check with a
membership test against the chosen endpoint's model list (use the endpoint's
models/model list property) when computing known (e.g., change known =
!!model.model_id && custom.some(...) to check endpoint.models.some(m => m.id ===
model.model_id) or the equivalent property), so that a model id is considered
"known" only if it exists on the resolved endpoint before skipping default
substitution (affecting the computation that sets effective).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@editor/app/desktop/settings/page.tsx`:
- Around line 331-349: detectInto is writing back a stale snapshot when
concurrent edits/removals occur; gate its async writes by tracking a request
version or cancellation token and/or merging into the latest draft before
calling providers.setEndpoint or setState. Specifically, when refresh() calls
detectInto(base,...), capture an incrementing requestId (or AbortSignal) and
check it right before performing providers.setEndpoint(next) and setState(...)
so late completions are ignored; likewise invalidate/advance that requestId when
handleSave() or handleRemove() start to prevent stale writes, and disable
editing inputs while probing or state.kind === "saving" (and/or while a probe
request is in flight) so the UI cannot produce conflicting edits.

---

Duplicate comments:
In `@packages/grida-ai-agent/src/runtime/index.ts`:
- Around line 503-514: The stale-model detection currently uses custom.some(...)
across all endpoints; narrow it to the selected endpoint by replacing the global
check with a membership test against the chosen endpoint's model list (use the
endpoint's models/model list property) when computing known (e.g., change known
= !!model.model_id && custom.some(...) to check endpoint.models.some(m => m.id
=== model.model_id) or the equivalent property), so that a model id is
considered "known" only if it exists on the resolved endpoint before skipping
default substitution (affecting the computation that sets effective).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 10b27523-7f2b-43a4-8280-f12fd651c200

📥 Commits

Reviewing files that changed from the base of the PR and between 5f7b70a and 7a46ea8.

📒 Files selected for processing (9)
  • docs/editor/desktop/local-models.md
  • editor/app/desktop/settings/page.tsx
  • editor/lib/desktop/bridge.ts
  • packages/grida-ai-agent/src/http/routes/providers.ts
  • packages/grida-ai-agent/src/protocol/endpoints.ts
  • packages/grida-ai-agent/src/providers/endpoints.live.test.ts
  • packages/grida-ai-agent/src/providers/endpoints.test.ts
  • packages/grida-ai-agent/src/providers/endpoints.ts
  • packages/grida-ai-agent/src/runtime/index.ts
✅ Files skipped from review due to trivial changes (1)
  • docs/editor/desktop/local-models.md
🚧 Files skipped from review as they are similar to previous changes (6)
  • packages/grida-ai-agent/src/providers/endpoints.test.ts
  • editor/lib/desktop/bridge.ts
  • packages/grida-ai-agent/src/providers/endpoints.ts
  • packages/grida-ai-agent/src/http/routes/providers.ts
  • packages/grida-ai-agent/src/providers/endpoints.live.test.ts
  • packages/grida-ai-agent/src/protocol/endpoints.ts

…gainst stale async write-backs (#806)

Two findings from CodeRabbit's incremental pass:

- runtime: the stale-model substitution now asks THIS endpoint whether
  it serves the persisted model_id — a different endpoint sharing the
  same id must not vouch for it (the global registry check would skip
  substitution and resolve limits from the wrong endpoint's window).

- settings: detection runs async off a snapshot while the form stays
  editable; a late completion could call setEndpoint with that snapshot
  and resurrect a just-deleted endpoint or wipe newer unsaved edits.
  An op-version ref now invalidates in-flight completions on every
  user action (edit/save/remove/re-setup), and save's read-back yields
  to edits made while it was in flight.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@vercel vercel Bot temporarily deployed to Preview – viewer June 12, 2026 10:31 Inactive
@vercel vercel Bot temporarily deployed to Preview – backgrounds June 12, 2026 10:31 Inactive
@vercel vercel Bot temporarily deployed to Preview – blog June 12, 2026 10:31 Inactive
@softmarshmallow

Copy link
Copy Markdown
Member Author

@coderabbitai Both findings from the incremental pass are addressed in 2375916:

  • runtime stale-model scopingknown is now knownOnEndpoint: membership is checked against the selected endpoint's own models, so another endpoint serving the same id can't vouch for a stale one.
  • settings stale write-backs — an op-version ref invalidates in-flight detection completions on every user action (edit / save / remove / re-setup); a late completion drops its write instead of resurrecting a deleted endpoint or wiping newer edits, and save's read-back yields to edits made while it was in flight. I kept the form editable during the auto-probe (it runs on every settings open; freezing the card for up to 4s each visit would be worse UX than version-gating).

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

@softmarshmallow Both fixes look well-reasoned:

  • knownOnEndpoint scoping — tying membership to the specific endpoint's models list is exactly the right boundary; cross-endpoint vouching was the subtle hazard and this eliminates it cleanly.
  • Op-version write-back invalidation — version-gating in-flight completions while keeping the form live is the correct trade-off. Freezing the settings card for a probe that runs on every open would be a real UX regression, and the version ref gives you the same safety guarantee without the cost.

Thanks for the detailed explanation of the write-back approach!

🐇✨

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
editor/app/desktop/settings/page.tsx (1)

337-360: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

The persisted probe can still overwrite or resurrect the endpoint after a concurrent user action.

opVersion only drops the local completion after providers.setEndpoint(next) returns. Once line 358 sends that host write, a later edit or delete cannot cancel it, so the stale probe can still land on disk if it finishes last. In practice that means a remove can still be undone by an older detectInto(..., { persist: true }), and an edit can leave endpoints.json rolled back to the probed snapshot even though the UI kept the newer draft. This needs a versioned/CAS host write, or the auto-refresh path needs to stop persisting while the form stays editable.

Also applies to: 440-458

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@editor/app/desktop/settings/page.tsx` around lines 337 - 360, The persisted
probe can still overwrite concurrent user edits because detectInto only checks
opVersion locally and then performs an unconditional
providers.setEndpoint(next); fix by making the write conditional: capture the
current opVersion (opVersion.current) as expectedVersion, re-check opVersion
before calling providers.setEndpoint, and change providers.setEndpoint to accept
an expectedVersion/CAS token so the host performs a conditional
(compare-and-set) write and returns success/failure; if the CAS fails, do not
call setState/overwrite the draft and instead refresh/notify. Apply the same
pattern to the other probe persistence path (the block around the 440-458 range)
so both probes use conditional writes tied to opVersion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@editor/app/desktop/settings/page.tsx`:
- Around line 337-360: The persisted probe can still overwrite concurrent user
edits because detectInto only checks opVersion locally and then performs an
unconditional providers.setEndpoint(next); fix by making the write conditional:
capture the current opVersion (opVersion.current) as expectedVersion, re-check
opVersion before calling providers.setEndpoint, and change providers.setEndpoint
to accept an expectedVersion/CAS token so the host performs a conditional
(compare-and-set) write and returns success/failure; if the CAS fails, do not
call setState/overwrite the draft and instead refresh/notify. Apply the same
pattern to the other probe persistence path (the block around the 440-458 range)
so both probes use conditional writes tied to opVersion.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a6595fa9-dbc6-466b-83f8-c809cae7e0ce

📥 Commits

Reviewing files that changed from the base of the PR and between 7a46ea8 and 2375916.

📒 Files selected for processing (2)
  • editor/app/desktop/settings/page.tsx
  • packages/grida-ai-agent/src/runtime/index.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/grida-ai-agent/src/runtime/index.ts

@softmarshmallow

Copy link
Copy Markdown
Member Author

@coderabbitai Declining the CAS escalation — walking the actual interleavings, the uncancellable window is too small to buy a wire-contract change:

  1. The long phase is the probe (up to ~4s), and the version gate sits after it, before the host write. The gate check and the setEndpoint dispatch are in the same synchronous block — a click handler cannot interleave between them. So any edit/save/remove that happens during the probe drops the persistence entirely. The only in-flight window left is the host write itself: a local atomic file write over loopback, single-digit milliseconds.

  2. Remove-resurrection needs an implausible schedule. For a stale set to land after the delete, the delete must be processed first — but the delete is dispatched only after a native confirm dialog (human-scale seconds), while the set POST arrives in milliseconds, and EndpointProvidersStore's write chain serializes mutations in arrival order. Reordering would require multi-second loopback latency on a single-user desktop app.

  3. The edit case, even when hit, isn't data loss. Disk gets a valid config (the loaded base + fresher detections — exactly what the next settings open would write anyway); the UI keeps the newer dirty draft thanks to the gate; Save then overwrites with the user's draft. Unsaved edits were never durable to begin with.

A CAS/expected-version token would touch the protocol, the store, the routes, and both bridge producers — real contract surface purchased against a race that requires seconds of loopback delay. The settings page is the config's only writer in this product; if that ever changes (multi-window settings, host-side background refresh), a conditional write becomes the right tool and the store's write chain is the natural place to anchor it.

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

@softmarshmallow That analysis is airtight. The three-part breakdown covers the full concern space:

  • Synchronous gate+dispatch closes the most dangerous window entirely — nothing can interleave between the version check and the setEndpoint call.
  • Write-chain serialization + confirm-dialog latency makes the remove-resurrection ordering essentially unreachable in practice.
  • Benign edit collision — disk gets a fresher-detection valid config, UI preserves the dirty draft, and Save wins. No durable data is lost.

The CAS cost (protocol, store, routes, both bridge producers) is clearly disproportionate to a race that requires seconds of single-user loopback delay. Agreed that the store's write chain is the right anchor point if the single-writer assumption ever changes (multi-window settings, background refresh). No further objections.

🐇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[desktop][agent] Support local LLMs (Ollama) — BYOK-style provider, no signup, for free users

1 participant