feat(agent): local LLMs (Ollama) as endpoint providers — no account, no API key (#806)#836
feat(agent): local LLMs (Ollama) as endpoint providers — no account, no API key (#806)#836softmarshmallow wants to merge 14 commits into
Conversation
Adds an explicit tool_call flag to every text-model spec and a models.text.registry namespace: CustomModelSpec (cost optional, conservative 8k-context defaults) with resolve() over catalogue ∪ user-registered models. The seam for local/endpoint models (#806). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…806) Generalizes the provider layer beyond the two BYOK key slots: one endpoint-provider type {base_url, optional key, registered models[]} with Ollama as the preset — a free user runs the agent end-to-end with no account and no API key. - protocol/endpoints: client-safe config contract + validator (http(s) URL, bounded sizes, no cost/credentials in config) + Ollama preset - providers/endpoints: file-backed EndpointProvidersStore (endpoints.json, atomic write, load-time re-validation) - makeEndpointFactory: every tier maps to the endpoint's default model (titler/compactor land on a served model); explicit ids pass through; a missing key is not an error - resolveProvider: BYOK keys first, then configured endpoints; explicit picks of either kind - run-input gate: model_id/provider_id validate over catalogue ∪ registered ids — still a closed gate, now registry-aware - compaction: injectable limits resolver; a tier-only endpoint session resolves the endpoint default model's real window instead of the 1M pro-tier fallback (overflow killed long local sessions), and the summarizer input cap follows the endpoint's model - /providers/endpoints/* CRUD routes (same auth stack) + transport client; secrets routes admit configured endpoint ids for keyed gateways (still no read-back, GRIDA-SEC-004) - SECURITY.md: GRIDA-SEC-004 endpoint-provider note — config/secret split, user-owned egress, srt localhost-vs-remote bounding Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ates (#806) Threads endpoint providers through the renderer chain: - desktop bridge contract + preload: optional providers namespace (list/set/delete endpoint configs; feature-detected by the renderer) - settings: Local Models card — Ollama preset slot with base URL, registered models (context window, tool-call flag), explicit expectation copy (no parity framing; ~30B+ recommended) - model picker: registered models join the list grouped by endpoint; seeding accepts registered ids (incl. late endpoint loads) - multimodal gate + context meter resolve via the open registry, so a local model's real (small) window drives the meter - ModelToolCallNotice: visible degraded-mode warning when the selected model is marked tool_call: false (permissive default — warn, not block) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
) Live verification against a real local Ollama (gemma4:31b-mlx) caught two calibration bugs: - OpenAI-compatible streams omit the usage chunk unless stream_options.include_usage is requested — every streamed run through createOpenAICompatible recorded zero tokens (no rollups, no context meter). Set includeUsage on both the OpenRouter and endpoint factories. - The titler's 32-token output cap (and the compactor's 1024) are consumed entirely by the think stream on reasoning models (finish_reason: length, empty content) — sessions stayed 'New Chat' and compaction could produce an empty summary. Raised to 512/2048 (ceilings — non-thinking models stop early) and gave the fire-and-forget titler a 60s ceiling so a single-flight local server that queues it behind the main turn doesn't starve it. Adds endpoints.live.test.ts (GRIDA_LIVE_OLLAMA=1, CI-skipped): keyless provider resolution, real text turn + usage + titler, multi-turn session continuity, manual compaction with summary content, and a real server-side write_file tool execution — all 5 pass against gemma4. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…806) The send body carried only model_id, so with a BYOK key configured the provider cascade resolved OpenRouter and passed a local model id (llama3.1:8b) upstream — an unservable model. A registered-model pick now rides provider_id (derived from the endpoint list); catalog picks keep the cascade. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
User-facing page under a new Desktop docs section: requirements + expectation framing, settings walkthrough, picker usage, the tools toggle, and troubleshooting. Screenshots captured from the real /desktop/* surfaces running against a live agent daemon and Ollama. Also adds the providers namespace to the dev-only web daemon bridge so the prod /desktop pages exercise endpoint config in a plain browser — the same path used to capture these screenshots. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Typing model ids by hand was the worst part of the setup flow, and the
data is discoverable: Ollama's /api/tags reports installed models WITH
capability tags. Adds POST /providers/endpoints/probe — a host-side
fetch (the packaged renderer's grida.co origin can't reach a local
Ollama through CORS) that parses Ollama's native listing, falls back to
the generic OpenAI /models shape (LiteLLM, vLLM), and returns reduced
{id, tool_call} rows; raw bodies are never proxied, reads are bounded.
Deliberately NOT probed: the context window — Ollama reports a model's
architectural max (262k for gemma4), not the window the server actually
serves (often 8k); auto-filling the max would make compaction fire too
late and kill long sessions. That field stays user-set with the safe
default.
Settings: 'Set up Ollama' now prefills the installed models immediately;
a Detect button re-scans after new pulls. Wired through the transport,
desktop bridge, preload, and web daemon bridge; doc page + screenshots
refreshed from the live flow. Live test probes the real Ollama and finds
the test model with tool_call: true.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Modern Ollama exposes the served context via REST: /api/ps reports a LOADED model's actual allocation (262144 for gemma4:31b-mlx on 0.30.7), and /api/show model_info carries the model maximum. The probe now fills contextWindow — ps (authoritative) over show (max) over unset — which retires the earlier 'cannot be detected' caveat for current servers; the field stays editable for explicitly capped setups, since the show maximum can overshoot a cap until the model is first loaded. Settings Detect also backfills missing fields on already-registered models (never clobbering user-set values). Doc + screenshot refreshed — the live flow detects gemma4 with ctx 262144 filled in. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
) No custom input over discoverable truth: a hand-typed snapshot of a value the endpoint reports only rots. The capability fields on a stored model entry (tool_call, contextWindow) are now detection-owned — the probe refreshes them on every settings visit (which also converges to /api/ps truth once a model gets loaded) and the rows render them as read-only badges. Inputs remain ONLY where detection has nothing (manual adds, ids-only gateways) — there the human is the data source. For the 'endpoint reports a wrong value' case, entries gain a sticky overrides slot (override → detected → registry default). Detection never writes it; it's hand-edited in endpoints.json — the settings card now links straight to the file (/providers/endpoints/info + native reveal), since local LLM users are developers. Both the host's registeredModels() and the renderer resolve through one shared helper, so every registry consumer sees effective values. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
#806) One WebP (the configured card with detection badges) instead of three — docs images grow forever, keep only what carries the feature. Content updated for detection-owned fields: 'tools toggle' section replaced with detection framing, troubleshooting now points at the overrides escape hatch instead of an editable context field that no longer exists. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…elpers (#806) /simplify pass over the branch, behavior-preserving: - protocol/endpoints: mergeProbedModels gives the detection-owned merge contract an executable, tested home (was inline in the settings page); endpointDefaultModelId is THE default-model rule (was derived in both the provider factory and the runtime limits resolver, which must agree); parseEndpointBaseUrl shared by validator + probe; overrides validation no longer checks outputLimit (overrides can't carry it); the id pattern is module-private again. - provider-ids: isByokProviderId hoisted (was re-written 4x) and a named ProviderId type replaces the ad-hoc 'ByokProviderId | (string & {})' unions across run options, session rows, and the desktop bridge. - providers/endpoints: isKnownProviderId — the one provider-id namespace gate shared by the /secrets allowlist and the run-input provider gate. - probe: getJson/postJson collapsed into one requestJson; the body size cap is now enforced on the wire (content-length + capped stream read, not buffer-then-measure); Ollama and generic probes fire concurrently so non-Ollama gateways don't wait out the Ollama timeout. - runtime: limitsResolver no longer double-reads the store (custom specs derived from the one configs snapshot), and summarizerInputCap is a sync function of that snapshot instead of a second store roundtrip. - editor: settings page delegates to mergeProbedModels, dead Save-button condition and dead badge fallbacks dropped; revealConfigFile reuses canRevealConfigFile; ModelToolCallNotice and the context meter memoize registry resolution (they render inside per-token re-render paths). GRIDA-SEC-004 walked: allowlists keep the same closed sets, the probe read bound is strictly stronger, no tag dropped. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
WalkthroughAdds configurable local/OpenAI-compatible endpoint providers (store, probe, validation), expands provider resolution and runtime, exposes provider APIs via transport/desktop bridge, integrates desktop settings and model-picker, documents security constraints, and adds comprehensive tests (unit, integration, live-gated). ChangesEndpoint Provider Support
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Suggested labels
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2730188d8d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
… approval resume (#806) Two Codex review findings on #836, both real: - /providers/endpoints/delete now also deletes the key stored under the endpoint's id. Without this the key was orphaned in auth.json — the /secrets/* allowlist only accepts CONFIGURED endpoint ids, so the leftover was undeletable, and re-creating the same endpoint id would silently reuse the stale credential. Both deletes are idempotent, so a partial failure is retry-safe. (Secrets stay write/presence/delete-only — GRIDA-SEC-004 unchanged.) - buildApprovalResumeBody carries provider_id, threaded from the same pin the normal send path uses. A resume re-enters /agent/run; without the pin, a local-model session answering a supervised approval would cascade BYOK-first and ship a registered model id to a provider that cannot serve it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 13
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@editor/app/desktop/settings/page.tsx`:
- Around line 452-617: Add an API key field to the "Local Models" card so
endpoint-scoped secrets can be set/cleared: in the component that renders the
Local Models Card (the JSX block referencing draft/base_url, models,
LocalModelRow and using edit(), handleSave(), handleRemove(), detectInto()), add
a labeled input bound to draft.api_key (or draft.apiKey if your model uses
camelCase) with a masked value (type="password"), a placeholder like "endpoint
API key", and onChange calling edit({ ...draft, api_key: e.target.value }). Also
add a small "Reveal/Clear" control (toggle to show plain text and a button to
clear the value) wired to edit() so users can delete the key; ensure the
existing save/remove flows (handleSave/handleRemove) persist and clear the
api_key field appropriately and that any validation that computes saveDisabled
includes the api_key presence when relevant.
In `@editor/lib/desktop/bridge.ts`:
- Around line 325-334: The secrets helpers exported from bridge.ts
(secrets.hasKey, secrets.setKey, secrets.deleteKey) currently accept only
ByokProviderId, leaving API-keyed endpoints unsupported; update their signatures
and exports to accept the union type used for endpoint IDs (e.g., ByokProviderId
| EndpointId or the shared ProviderId type used by the provider config) and
export these updated helpers alongside the provider API so the renderer can call
secrets.hasKey/setKey/deleteKey with endpoint ids for keyed gateways; ensure
internal calls and type guards still handle both cases and update any related
type imports/exports to keep typings consistent.
In `@editor/scaffolds/desktop/shared/model-picker.tsx`:
- Around line 48-58: The picker currently emits only m.id which loses endpoint
identity when two endpoints register the same model; update DesktopModelPicker
(and its internal handler used around the options rendering at the block
referenced 74-88) to carry a provider-qualified selection instead of a bare
model id: change the prop types (value: string and
onValueChange(modelId:string)) to either a compound string like
`${providerId}:${modelId}` or a small object { providerId, modelId }, update the
option value generation where endpoint-specific rows are rendered to include the
provider id, and update the onChange/onValueChange invocation to emit the
provider-qualified value so downstream consumers (ModelToolCallNotice, session
reseeding, handoff/run wiring) can unambiguously resolve which endpoint was
selected.
In `@editor/scaffolds/desktop/shared/registered-models.ts`:
- Around line 45-51: providerIdForModel currently finds the endpoint by matching
modelId, which breaks when multiple providers expose the same model id; instead,
include the chosen provider identity with the renderer state and/or namespace
picker values so the UI sends the explicit provider id. Update the model picker
components in chat.tsx and agent-pane.tsx to store a composite value (e.g.
{providerId, modelId} or namespaced string) and adjust any consumers that call
providerIdForModel to use the stored providerId directly; remove or deprecate
providerIdForModel usages where the provider should be read from state rather
than inferred from modelId.
In `@packages/grida-ai-agent/src/http/routes/providers.ts`:
- Around line 64-70: The current catch for endpoints.set(result.config) always
returns HTTP 400, which mislabels filesystem/write failures (e.g., from
atomicWrite) as client errors; update the catch to distinguish
validation/semantic errors from persistence failures by checking the error type
or properties (e.g., instanceof ValidationError or an error.code like
ENOSPC/EACCES or a custom error thrown by endpoints.set/atomicWrite) and return
c.json({ error: err.message }, 400) for validation-related errors but return a
500 (or c.status(500).json...) with the error message for filesystem/persistence
errors; adjust the handler around endpoints.set(...) to perform this type-based
branching so callers receive correct 4xx vs 5xx responses.
- Around line 90-96: The handler for POST /providers/endpoints/probe maps all
probe() failures to 502; change the response mapping so client-side validation
errors return 400 while upstream probe failures remain 502: update the route
handler (the async callback for app.post("/providers/endpoints/probe")) to
inspect the probe() result (e.g., a returned status/code or a validation flag on
the result object) and if it indicates a malformed input/validation error
respond with c.json({ error: result.error }, 400), otherwise keep c.json({
error: result.error }, 502); if probe() currently doesn't differentiate, add a
validation indicator (e.g., result.status === 400 or result.validation === true)
in probe()/probeEndpointModels() and use that in the handler.
In `@packages/grida-ai-agent/src/protocol/endpoints.ts`:
- Around line 215-231: The parseEndpointBaseUrl function accepts and returns the
raw string without trimming, so whitespace-padded values can pass URL parsing
but be persisted with spaces; update parseEndpointBaseUrl to trim the input
first (e.g., const s = String(raw).trim()), validate the trimmed value against
MAX_BASE_URL_LEN and emptiness, construct the URL from the trimmed string, and
return base_url: trimmedValue and url; keep existing error branches but ensure
all checks use the trimmed value rather than raw.
In `@packages/grida-ai-agent/src/providers/endpoints.live.test.ts`:
- Around line 148-152: The afterEach teardown should guard against partial setup
failures by making cleanup conditional: check that host is defined before
calling host.runtime.dispose() and host.store.close(), and check that
host.runtime and host.store exist as needed; always await fs.rm(baseDir, {
recursive: true, force: true }) regardless of host to ensure filesystem cleanup.
Update the afterEach block (referencing afterEach, host, host.runtime.dispose,
host.store.close, baseDir) to perform null/undefined checks around host and its
properties so teardown doesn't throw when beforeEach fails.
In `@packages/grida-ai-agent/src/providers/endpoints.ts`:
- Around line 62-118: The class races because ensureLoaded sets loaded
prematurely and reads/writes to this.entries aren't serialized; add an async
mutex (per-instance lock) and use it to serialize load and write operations:
change ensureLoaded to acquire the lock, perform fs.readFile/JSON.parse and
validation, set this.loaded = true only after successful load, then release the
lock; have persist(), set(), delete() (and any other mutators) acquire the same
lock for their critical sections so they read/update this.entries and call
atomicWrite under the lock; keep list()/get() awaiting ensureLoaded but avoid
flipping loaded before loading completes by tying ensureLoaded to the lock (or a
loadPromise) so concurrent callers wait for the single completed load. Ensure
you reference and protect symbols: ensureLoaded, loaded, entries, persist, set,
delete (if present), file_path, atomicWrite, MAX_ENTRIES.
In `@packages/grida-ai-agent/src/providers/probe.ts`:
- Around line 66-83: The code currently awaits ollamaProbe before considering
openaiProbe, which can delay the fast OpenAI fallback; change the probe flow in
the probe function so both promises (ollamaProbe and openaiProbe created via
requestJson) are polled concurrently and the code proceeds as soon as
openaiProbe resolves if ollamaProbe is still pending — for example, await
whichever probe resolves first or check openaiProbe result when available
instead of blocking on the ollama await; keep using parseOllamaTags and
enrichContextWindows when ollamaProbe resolves successfully (use the ollama
variable only when its promise finished and ok), otherwise return the openai
result immediately when openaiProbe completes successfully.
In `@packages/grida-ai-agent/src/runtime/index.ts`:
- Around line 530-531: The current cap calculation uses Math.max(1_024, window -
4_096) which can exceed the resolved model window for tiny/custom models; change
the return so the summarizer cap is clamped to the resolved context_window by
returning the smaller of the window and the computed cap (e.g. use
Math.min(window, Math.max(1_024, window - 4_096))). Update the expression around
limits.resolve({ provider_id: providerId }).context_window and the return so the
summarizer never receives more history than the model's context_window.
- Around line 501-509: The current ResolveModelLimits logic only substitutes
endpointDefaultModelId when model.model_id is absent, but if a persisted
model_id no longer exists on the endpoint it should be treated as missing;
update the block in ResolveModelLimits so that after finding endpoint =
configs.find(e => e.id === model.provider_id) you also check whether the
endpoint actually exposes the persisted model_id (e.g. via whatever list/field
the endpoint exposes) and if it does not, set effective = { ...model, model_id:
defaultId } just like the missing case, then call resolveModelLimits(effective,
custom); use the existing symbols ResolveModelLimits, resolveModelLimits,
endpointDefaultModelId, configs, provider_id and model_id to locate and
implement this conditional.
In `@packages/grida-ai-agent/src/runtime/run-input.ts`:
- Around line 115-126: The request boundary currently accepts any known provider
id via isKnownProviderId even if the endpoint is a placeholder (models: []);
change the check that gates explicit provider selections to use the runtime's
"resolvable" rule instead of the simple existence check—call the existing
resolvability helper (e.g., isProviderResolvable or the runtime function that
verifies an endpoint can be resolved for a run) with providerId and
deps.endpoints and only allow when that returns true; update the branch that
sets explicit to providerId accordingly and adjust the test to expect rejection
for placeholder/empty-model configs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 3744199a-4ae2-4b0a-91dc-1bf1e07ed358
📒 Files selected for processing (50)
SECURITY.mddesktop/src/preload.tsdocs/editor/desktop/_category_.jsondocs/editor/desktop/img/local-models-configured.webpdocs/editor/desktop/local-models.mdeditor/app/desktop/settings/page.tsxeditor/app/desktop/welcome/page.tsxeditor/lib/agent-chat/build-agent-send.test.tseditor/lib/agent-chat/build-agent-send.tseditor/lib/agent-chat/web-daemon-bridge.tseditor/lib/desktop/bridge-boundary.test.tseditor/lib/desktop/bridge.tseditor/scaffolds/desktop/ai-sidebar/chat.tsxeditor/scaffolds/desktop/shared/context-meter.tsxeditor/scaffolds/desktop/shared/model-picker.tsxeditor/scaffolds/desktop/shared/registered-models.tseditor/scaffolds/desktop/workbench/agent-pane.tsxpackages/grida-ai-agent/README.mdpackages/grida-ai-agent/src/__public-api__.test.tspackages/grida-ai-agent/src/http/routes/handshake.tspackages/grida-ai-agent/src/http/routes/providers.tspackages/grida-ai-agent/src/http/routes/secrets.tspackages/grida-ai-agent/src/http/server.tspackages/grida-ai-agent/src/index.tspackages/grida-ai-agent/src/neutral-globals.d.tspackages/grida-ai-agent/src/protocol/endpoints.tspackages/grida-ai-agent/src/protocol/handshake.tspackages/grida-ai-agent/src/protocol/provider-ids.tspackages/grida-ai-agent/src/protocol/run.tspackages/grida-ai-agent/src/providers/byok.tspackages/grida-ai-agent/src/providers/endpoints.live.test.tspackages/grida-ai-agent/src/providers/endpoints.test.tspackages/grida-ai-agent/src/providers/endpoints.tspackages/grida-ai-agent/src/providers/index.test.tspackages/grida-ai-agent/src/providers/index.tspackages/grida-ai-agent/src/providers/probe.test.tspackages/grida-ai-agent/src/providers/probe.tspackages/grida-ai-agent/src/runtime/index.tspackages/grida-ai-agent/src/runtime/run-input.test.tspackages/grida-ai-agent/src/runtime/run-input.tspackages/grida-ai-agent/src/session/compaction.test.tspackages/grida-ai-agent/src/session/compaction.tspackages/grida-ai-agent/src/session/compactor.tspackages/grida-ai-agent/src/session/rows.tspackages/grida-ai-agent/src/session/titler.tspackages/grida-ai-agent/src/transport.tspackages/grida-ai-models/README.mdpackages/grida-ai-models/__tests__/registry.test.tspackages/grida-ai-models/src/models.tspackages/grida-desktop-bridge/src/index.ts
…te status codes (#806) Nine of thirteen findings applied (four declined on the thread, with reasons — single-endpoint v1 makes provider-qualified picker values unconstructible; probe preference is deliberate fidelity; run gate stays membership-only since the resolver already fails closed): - settings: the Local Models card gains an API key slot (EndpointKeyRow) for keyed gateways — same write/presence/delete-only secrets surface as BYOK, under the endpoint's id, rendered once the endpoint is persisted (the allowlist accepts configured ids only). Makes the doc's 'save the key under Settings' sentence actually true. - bridge: secrets.hasKey/setKey/deleteKey/confirmDeleteKey widened to ProviderId (endpoint ids were already valid on the host; the renderer types just couldn't say so). - store: one shared load promise (a concurrent first read can no longer observe the empty cache mid-load) + a write chain so concurrent set()/delete() can't persist stale snapshots over each other. - routes: store-level persistence failures are 500s, not 400s; a malformed probe base_url is a 400, not a 502 'outage'. - protocol: parseEndpointBaseUrl trims padding (it would survive URL parsing yet break the string-concatenated request base). - runtime: a STALE persisted local model_id (removed from the config) now falls back to the endpoint default for limits — same trap as the missing-id case; summarizer input cap clamped to the model window. - live test teardown guards against partial setup failures. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
editor/app/desktop/settings/page.tsx (1)
331-349:⚠️ Potential issue | 🟠 Major | ⚡ Quick winInvalidate stale probe/save completions before writing them back.
refresh()always startsdetectInto(ollama, { persist: true }), but the form stays editable andhandleRemove()can run before that promise settles. A late completion still callsproviders.setEndpoint(next)/setState(...)with the stalebase/draftsnapshot, which can recreate a just-deleted endpoint or wipe newer unsaved edits. Please gate these async writes behind a request version/cancel check (or merge into the latest draft) and freeze editing while one is in flight.Possible direction
-import { useCallback, useEffect, useState } from "react"; +import { useCallback, useEffect, useRef, useState } from "react"; ... + const draftVersionRef = useRef(0); + const edit = useCallback((next: EndpointProviderConfig) => { + draftVersionRef.current += 1; setState({ kind: "ready", draft: next, dirty: true }); }, []); ... const detectInto = useCallback( async (base: EndpointProviderConfig, opts: { persist: boolean }) => { + const version = draftVersionRef.current; setProbing(true); setProbeNote(null); try { const result = await providers.probeEndpoint(base.base_url); const merged = mergeProbedModels(base.models, result.models); ... const next = { ...base, models: merged.models }; + if (draftVersionRef.current !== version) return; if (opts.persist) { await providers.setEndpoint(next); + if (draftVersionRef.current !== version) return; setState({ kind: "ready", draft: next, dirty: false }); } else { setState({ kind: "ready", draft: next, dirty: true }); } ...Also invalidate the version before
handleSave()/handleRemove()kick off, and disable the editable inputs whileprobingorstate.kind === "saving".Also applies to: 364-377, 393-401, 419-435, 488-569
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@editor/app/desktop/settings/page.tsx` around lines 331 - 349, detectInto is writing back a stale snapshot when concurrent edits/removals occur; gate its async writes by tracking a request version or cancellation token and/or merging into the latest draft before calling providers.setEndpoint or setState. Specifically, when refresh() calls detectInto(base,...), capture an incrementing requestId (or AbortSignal) and check it right before performing providers.setEndpoint(next) and setState(...) so late completions are ignored; likewise invalidate/advance that requestId when handleSave() or handleRemove() start to prevent stale writes, and disable editing inputs while probing or state.kind === "saving" (and/or while a probe request is in flight) so the UI cannot produce conflicting edits.
♻️ Duplicate comments (1)
packages/grida-ai-agent/src/runtime/index.ts (1)
503-514:⚠️ Potential issue | 🟠 Major | ⚡ Quick winScope stale-model detection to the selected endpoint.
Line 511 checks
custom.some(...)across all endpoint models. If two endpoints share a model id, a stalemodel_idfor this provider can still look “known,” skipping default substitution and resolving limits from the wrong window.🛠️ Proposed fix
- const known = - !!model.model_id && custom.some((m) => m.id === model.model_id); - if (defaultId && (!model.model_id || !known)) { + const knownOnEndpoint = + !!model.model_id && + !!endpoint?.models.some((m) => m.id === model.model_id); + if (defaultId && (!model.model_id || !knownOnEndpoint)) { effective = { ...model, model_id: defaultId }; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/grida-ai-agent/src/runtime/index.ts` around lines 503 - 514, The stale-model detection currently uses custom.some(...) across all endpoints; narrow it to the selected endpoint by replacing the global check with a membership test against the chosen endpoint's model list (use the endpoint's models/model list property) when computing known (e.g., change known = !!model.model_id && custom.some(...) to check endpoint.models.some(m => m.id === model.model_id) or the equivalent property), so that a model id is considered "known" only if it exists on the resolved endpoint before skipping default substitution (affecting the computation that sets effective).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@editor/app/desktop/settings/page.tsx`:
- Around line 331-349: detectInto is writing back a stale snapshot when
concurrent edits/removals occur; gate its async writes by tracking a request
version or cancellation token and/or merging into the latest draft before
calling providers.setEndpoint or setState. Specifically, when refresh() calls
detectInto(base,...), capture an incrementing requestId (or AbortSignal) and
check it right before performing providers.setEndpoint(next) and setState(...)
so late completions are ignored; likewise invalidate/advance that requestId when
handleSave() or handleRemove() start to prevent stale writes, and disable
editing inputs while probing or state.kind === "saving" (and/or while a probe
request is in flight) so the UI cannot produce conflicting edits.
---
Duplicate comments:
In `@packages/grida-ai-agent/src/runtime/index.ts`:
- Around line 503-514: The stale-model detection currently uses custom.some(...)
across all endpoints; narrow it to the selected endpoint by replacing the global
check with a membership test against the chosen endpoint's model list (use the
endpoint's models/model list property) when computing known (e.g., change known
= !!model.model_id && custom.some(...) to check endpoint.models.some(m => m.id
=== model.model_id) or the equivalent property), so that a model id is
considered "known" only if it exists on the resolved endpoint before skipping
default substitution (affecting the computation that sets effective).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 10b27523-7f2b-43a4-8280-f12fd651c200
📒 Files selected for processing (9)
docs/editor/desktop/local-models.mdeditor/app/desktop/settings/page.tsxeditor/lib/desktop/bridge.tspackages/grida-ai-agent/src/http/routes/providers.tspackages/grida-ai-agent/src/protocol/endpoints.tspackages/grida-ai-agent/src/providers/endpoints.live.test.tspackages/grida-ai-agent/src/providers/endpoints.test.tspackages/grida-ai-agent/src/providers/endpoints.tspackages/grida-ai-agent/src/runtime/index.ts
✅ Files skipped from review due to trivial changes (1)
- docs/editor/desktop/local-models.md
🚧 Files skipped from review as they are similar to previous changes (6)
- packages/grida-ai-agent/src/providers/endpoints.test.ts
- editor/lib/desktop/bridge.ts
- packages/grida-ai-agent/src/providers/endpoints.ts
- packages/grida-ai-agent/src/http/routes/providers.ts
- packages/grida-ai-agent/src/providers/endpoints.live.test.ts
- packages/grida-ai-agent/src/protocol/endpoints.ts
…gainst stale async write-backs (#806) Two findings from CodeRabbit's incremental pass: - runtime: the stale-model substitution now asks THIS endpoint whether it serves the persisted model_id — a different endpoint sharing the same id must not vouch for it (the global registry check would skip substitution and resolve limits from the wrong endpoint's window). - settings: detection runs async off a snapshot while the form stays editable; a late completion could call setEndpoint with that snapshot and resurrect a just-deleted endpoint or wipe newer unsaved edits. An op-version ref now invalidates in-flight completions on every user action (edit/save/remove/re-setup), and save's read-back yields to edits made while it was in flight. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
@coderabbitai Both findings from the incremental pass are addressed in 2375916:
|
|
Tip For best results, initiate chat on the files or code changes.
Thanks for the detailed explanation of the write-back approach! 🐇✨ |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
editor/app/desktop/settings/page.tsx (1)
337-360:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftThe persisted probe can still overwrite or resurrect the endpoint after a concurrent user action.
opVersiononly drops the local completion afterproviders.setEndpoint(next)returns. Once line 358 sends that host write, a later edit or delete cannot cancel it, so the stale probe can still land on disk if it finishes last. In practice that means a remove can still be undone by an olderdetectInto(..., { persist: true }), and an edit can leaveendpoints.jsonrolled back to the probed snapshot even though the UI kept the newer draft. This needs a versioned/CAS host write, or the auto-refresh path needs to stop persisting while the form stays editable.Also applies to: 440-458
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@editor/app/desktop/settings/page.tsx` around lines 337 - 360, The persisted probe can still overwrite concurrent user edits because detectInto only checks opVersion locally and then performs an unconditional providers.setEndpoint(next); fix by making the write conditional: capture the current opVersion (opVersion.current) as expectedVersion, re-check opVersion before calling providers.setEndpoint, and change providers.setEndpoint to accept an expectedVersion/CAS token so the host performs a conditional (compare-and-set) write and returns success/failure; if the CAS fails, do not call setState/overwrite the draft and instead refresh/notify. Apply the same pattern to the other probe persistence path (the block around the 440-458 range) so both probes use conditional writes tied to opVersion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@editor/app/desktop/settings/page.tsx`:
- Around line 337-360: The persisted probe can still overwrite concurrent user
edits because detectInto only checks opVersion locally and then performs an
unconditional providers.setEndpoint(next); fix by making the write conditional:
capture the current opVersion (opVersion.current) as expectedVersion, re-check
opVersion before calling providers.setEndpoint, and change providers.setEndpoint
to accept an expectedVersion/CAS token so the host performs a conditional
(compare-and-set) write and returns success/failure; if the CAS fails, do not
call setState/overwrite the draft and instead refresh/notify. Apply the same
pattern to the other probe persistence path (the block around the 440-458 range)
so both probes use conditional writes tied to opVersion.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a6595fa9-dbc6-466b-83f8-c809cae7e0ce
📒 Files selected for processing (2)
editor/app/desktop/settings/page.tsxpackages/grida-ai-agent/src/runtime/index.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/grida-ai-agent/src/runtime/index.ts
|
@coderabbitai Declining the CAS escalation — walking the actual interleavings, the uncancellable window is too small to buy a wire-contract change:
A CAS/expected-version token would touch the protocol, the store, the routes, and both bridge producers — real contract surface purchased against a race that requires seconds of loopback delay. The settings page is the config's only writer in this product; if that ever changes (multi-window settings, host-side background refresh), a conditional write becomes the right tool and the store's write chain is the natural place to anchor it. |
|
Tip For best results, initiate chat on the files or code changes.
The CAS cost (protocol, store, routes, both bridge producers) is clearly disproportionate to a race that requires seconds of single-user loopback delay. Agreed that the store's write chain is the right anchor point if the single-writer assumption ever changes (multi-window settings, background refresh). No further objections. 🐇 |
Summary
Implements the v1 cut decided in #806's design-pass handoff: local LLMs (Ollama) as a BYOK-shaped endpoint provider — no account, no API key. Built as "open model registry + custom-endpoint provider" infrastructure; Ollama lands as a preset.
The registry seam (
@grida/ai-models)tool_callcapability flag, explicit on every catalog entry.models.text.registry:CustomModelSpec(cost optional, conservative defaults — 8k context, tool-calling assumed) +resolve(id, custom)over catalogue ∪ user-registered models.Endpoint providers (
@grida/agent){base_url, apiKey?, models[]}(protocol/endpoints.ts), withOLLAMA_ENDPOINT_PRESET(http://localhost:11434/v1, no key). Stays inside the "not a general model-provider router" anti-goal — presets, not config-declared arbitrary providers.EndpointProvidersStore— plain config at${userData}/endpoints.json, a deliberate sibling ofSecretsStore(configs readable; an optional gateway key rides/secrets/*under the endpoint id, never readable back).makeEndpointFactory: every tier maps to the endpoint's default model — titler/compactor tier calls land on a model the endpoint actually serves; a missing key is not an error.resolveProvider: BYOK keys take precedence; configured endpoints (≥1 model) follow; explicit picks of either kind.ALLOWED_MODEL_IDSgate validates over catalogue ∪ registered ids (still a closed gate).AgentModelId/provider_id/ChatModel.provider_idopened on the wire (string & {}), runtime-gated./providers/endpoints/*CRUD routes behind the same CORS/Referer/Basic-Auth stack;AgentTransport.Client.providers.Desktop + renderer
providersbridge namespace (contract + preload), feature-detected so old binaries degrade.ModelToolCallNotice: visible degraded-mode warning when the selected model is markedtool_call: false(permissive default — warn, don't block; noToolLoopAgentrestructuring in v1, per the handoff).Security (GRIDA-SEC-004 review done)
allowLocalBinding's local-ip rule permits loopback egress (howollama serveis reached) while a hostile remotebase_urlstays blocked by the enumerated domain allowlist — a malicious config cannot turn the sidecar into an open exfil channel.Live verification (real Ollama, no key)
endpoints.live.test.ts(gatedGRIDA_LIVE_OLLAMA=1, CI-skipped) — all 5 pass against a localgemma4:31b-mlxwith no BYOK secret configured:write_fileand the file lands in the workspace.Live testing caught two calibration bugs, fixed in this PR:
createOpenAICompatiblestream (incl. the existing OpenRouter BYOK path): the OpenAI-compat spec only emits the usage chunk whenstream_options.include_usageis set. Both factories now setincludeUsage: true.finish_reason: length, empty content) — sessions stayed "New Chat". Caps raised to 512/2048 (pure ceilings; non-thinking models stop early), titler timeout raised to 60s for single-flight local servers.Tests
@grida/agent: 492 passed (29 new — validator/store/routes, resolver cascade + tier mapping, run-input gates, compaction limits, public-API pins).@grida/ai-models: 33 passed (9 new registry tests)./providers/endpoints/).@grida/agent+ desktop + editor typecheck clean (editor's pre-existing canvas-wasm/LFS-asset errors unrelated).User docs
New Desktop docs section:
docs/editor/desktop/local-models.md— requirements with honest expectation framing, settings walkthrough, picker usage, tool-support detection, troubleshooting, and theoverridesescape hatch. One screenshot (WebP), deliberately: the configured Local Models card with detection badges — captured from the real/desktop/*surface running against a live agent daemon + Ollama (via the web-daemon dev boot, which also gained theprovidersbridge namespace so the prod pages are exercisable in a plain browser). Docs images grow monotonically with the site; we keep only the shot that carries the feature and is least likely to churn.Detection is authoritative; humans get an explicit override slot
No custom input over discoverable truth: the capability fields on a stored model entry are detection-owned — refreshed on every settings visit (converging to
/api/pstruth once a model loads) and rendered as read-only badges. Editable inputs appear ONLY where detection has nothing (manual adds, ids-only gateways). For "the endpoint reports a wrong value," entries carry a stickyoverridesslot (override → detected → registry default) that detection never writes — hand-edited inendpoints.json, which the settings card links to directly (local-LLM users are developers). Host and renderer resolve through one shared helper so every registry consumer sees effective values.Model auto-detection
"Set up Ollama" prefills the installed models immediately, and a Detect button re-scans after new pulls (also backfilling missing fields on already-registered models, never clobbering user-set values) — no typing model ids.
POST /providers/endpoints/probeis a host-side fetch (the packaged renderer's grida.co origin can't reach a local Ollama through CORS) of Ollama's/api/tags(ids + capability tags, sotool_callcomes back real), enriched with the context window —/api/psfirst (a loaded model'scontext_lengthis the server's actual allocation; 262144 for gemma4:31b-mlx on Ollama 0.30.7), then/api/showmodel_info(model max) — and falling back to the generic OpenAI/modelsshape (LiteLLM, vLLM). Responses are parsed and reduced to{id, tool_call, contextWindow}rows — never proxied raw; reads are bounded (timeout + size cap). The context field stays editable: a server explicitly capped below a model's max (OLLAMA_CONTEXT_LENGTH) only reports the cap once the model is loaded.Cleanup pass
A 4-angle review (reuse / simplification / efficiency / altitude) was applied at the end: the detection-merge contract moved into
protocol/endpoints.tsas a tested puremergeProbedModels(was inline UI code), the default-model rule and the provider-id namespace gate each got one shared home, the probe's size cap is now enforced on the wire and its two shapes probe concurrently, and the per-token chat render paths memoize registry resolution.Out of scope (deliberate)
Closes #806.
🤖 Generated with Claude Code
Summary by CodeRabbit