fix: prevent workers from getting stuck in timeout-retry loops by TheDarkSkyXD · Pull Request #477 · spacedriveapp/spacebot

TheDarkSkyXD · 2026-03-23T04:13:34Z

Summary

Add 10-minute per-request timeout on Anthropic API calls — the global reqwest::Client timeout of 120s was too short for large completions with extended thinking, causing requests to time out before the API could finish generating. The per-request override matches the scale of the streaming path (30min).
Reduce worker-level transient retries from 5 to 3 — 5 retries with exponential backoff created ~30-minute stalls before workers gave up. 3 retries (9 total attempts including model-level retries) is sufficient to survive temporary outages without blocking indefinitely.

Context

Workers spawning large LLM requests (e.g. writing a full PRD document) were getting stuck in a timeout→retry cascade:

120s client timeout fires before Anthropic can finish a large response
Model-level retry: 3 attempts × 120s = ~6 min per cycle
Worker-level retry: 5 cycles with backoff = ~30 min total stall
Worker finally fails, parent respawns it → cycle repeats

Test plan

Verify spacebot builds cleanly (cargo build -p spacebot)
Spawn a worker with a large task (e.g. "write a detailed PRD") and confirm it completes instead of timing out
Confirm short tasks still work normally with no regression
Verify that genuine API outages still fail within a reasonable time (~3 min instead of ~30 min)

🤖 Generated with Claude Code

coderabbitai · 2026-03-23T04:13:58Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9185895c-d593-45ed-b66d-0e3642ccd513

📥 Commits

Reviewing files that changed from the base of the PR and between 248967e and 5a5d5b8.

📒 Files selected for processing (2)

src/agent/worker.rs
src/llm/anthropic/params.rs

🚧 Files skipped from review as they are similar to previous changes (1)

src/llm/anthropic/params.rs

Walkthrough

Worker-level transient retry cap reduced from 5 to 3 attempts. Anthropic API /v1/messages endpoint now enforces a per-request HTTP timeout of 10 minutes, overriding client-level timeout configuration.

Changes

Cohort / File(s)	Summary
Worker Retry Configuration `src/agent/worker.rs`	Reduced `MAX_TRANSIENT_RETRIES` constant from 5 to 3; updated accompanying comment to clarify worker retry behavior with model-level attempts.
Anthropic API Timeout `src/llm/anthropic/params.rs`	Added explicit per-request HTTP timeout of 10 minutes (`Duration::from_secs(10 * 60)`) via `.timeout()` on `RequestBuilder` for `/v1/messages` POST call in `build_anthropic_request` function.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Suggested reviewers

jamiepine

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: prevent workers from getting stuck in timeout-retry loops' directly aligns with the main changes: adding a per-request timeout override and reducing retry counts to prevent prolonged stalls.
Description check	✅ Passed	The description clearly explains both changes (per-request timeout and reduced retries), provides context on the problem being solved, and includes a test plan.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (5)

src/llm/anthropic/params.rs (1)

100-106: LGTM — per-request timeout correctly overrides the client-level default.

The 10-minute timeout is appropriate for large completions with extended thinking, and the comment clearly explains the rationale. This directly addresses the timeout→retry cascade described in the PR.

Consider extracting the timeout value to a named constant for self-documentation:
const ANTHROPIC_REQUEST_TIMEOUT: std::time::Duration = std::time::Duration::from_secs(10 * 60);
This is optional; the inline expression is clear enough given the comment.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/anthropic/params.rs` around lines 100 - 106, Extract the inline
duration into a named constant for clarity and reuse: define a const like
ANTHROPIC_REQUEST_TIMEOUT (type std::time::Duration) and replace the inline
std::time::Duration::from_secs(10 * 60) in the builder.timeout(...) call (the
http_client.post(&url).timeout(...) chain) with that constant; keep the existing
comment and header usage around the builder variable unchanged.

interface/src/components/ModelSelect.tsx (3)

102-105: Consider hoisting providerRank outside the component.

Since providerOrder is a module-level constant, providerRank doesn't depend on any component state and can be defined at module scope for clarity.

♻️ Optional: move to module scope

+ const providerRank = (p: string) => {
+   const index = providerOrder.indexOf(p);
+   return index === -1 ? Number.MAX_SAFE_INTEGER : index;
+ };
+
  export function ModelSelect({
    ...
- const providerRank = (p: string) => {
-   const index = providerOrder.indexOf(p);
-   return index === -1 ? Number.MAX_SAFE_INTEGER : index;
- };

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@interface/src/components/ModelSelect.tsx` around lines 102 - 105, Hoist the
providerRank helper out of the React component by moving its definition to
module scope (using the existing module-level providerOrder constant) so it no
longer gets recreated on each render; locate providerRank in ModelSelect.tsx,
cut the function from inside the component body and paste it above the component
definition (keeping the same implementation that returns Number.MAX_SAFE_INTEGER
when index === -1), and ensure any references to providerRank inside the
component remain unchanged.

313-316: flatList.indexOf(model) inside render loop causes O(n²) complexity.

Each model lookup traverses the entire flatList. With hundreds of models this can cause noticeable lag, especially during mouse hover events that trigger setHighlightIndex.

♻️ Proposed fix: build an index map once

  const flatList = useMemo(() => {
    const items: ModelInfo[] = [];
    for (const p of sortedProviders) {
      for (const m of grouped[p]) {
        items.push(m);
      }
    }
    return items;
  }, [sortedProviders, grouped]);

+ const flatIndexMap = useMemo(() => {
+   const map = new Map<string, number>();
+   flatList.forEach((m, i) => map.set(m.id, i));
+   return map;
+ }, [flatList]);

Then in the render:

- const flatIdx = flatList.indexOf(model);
+ const flatIdx = flatIndexMap.get(model.id) ?? -1;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@interface/src/components/ModelSelect.tsx` around lines 313 - 316, The render
loop is doing flatList.indexOf(model) causing O(n²) behavior; create a lookup
map once (e.g., build a flatIndexMap keyed by model id or a stable identifier
from flatList) and use that map inside the grouped[prov].map to compute flatIdx
instead of indexOf; update references to flatIdx, highlightIndex,
setHighlightIndex, model.id and value to use the map lookup and ensure the map
is rebuilt only when flatList changes (memoize or compute outside the render
loop).

198-214: Minor UX inconsistency: ArrowUp doesn't open the dropdown when closed.

ArrowDown opens the dropdown (line 200-203), but ArrowUp only cycles through items. This may be intentional, but for consistency you could consider also opening on ArrowUp.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@interface/src/components/ModelSelect.tsx` around lines 198 - 214, The ArrowUp
handler should mirror ArrowDown by opening the dropdown when it's closed; inside
the ArrowUp branch (where e.key === "ArrowUp") add the same check used by
ArrowDown: if (!open) { setOpen(true); setFilter(value); } and then adjust the
setHighlightIndex logic to handle the newly opened state (e.g., initialize to
flatList.length - 1 so the last item is highlighted) while preserving the
existing decrement/wrap behavior; update references to setOpen, setFilter,
setHighlightIndex, flatList, and value in the ArrowUp block accordingly.

scripts/bundle-sidecar.ps1 (1)

33-33: Replace Write-Host with Write-Information for better log stream compatibility.

Write-Host outputs directly to the console and cannot be captured by redirection operators; Write-Information participates in PowerShell output streams and allows better control via -InformationAction.

♻️ Proposed refactor

-Write-Host "Building spacebot ($BuildMode) for $TargetTriple..."
+Write-Information "Building spacebot ($BuildMode) for $TargetTriple..." -InformationAction Continue
@@
-Write-Host "Copied $SrcBin -> $DestBin"
-Write-Host "Sidecar binary ready."
+Write-Information "Copied $SrcBin -> $DestBin" -InformationAction Continue
+Write-Information "Sidecar binary ready." -InformationAction Continue

Also applies to: 51-52

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/bundle-sidecar.ps1` at line 33, Replace the direct console writes
using Write-Host with Write-Information so messages participate in PowerShell's
information stream; update the call shown (the literal line containing
Write-Host "Building spacebot ($BuildMode) for $TargetTriple...") and the other
occurrences around lines 51-52 to use Write-Information "Building spacebot
($BuildMode) for $TargetTriple..." -InformationAction Continue (preserving the
original message text and any variable interpolation) so logs can be
captured/controlled by InformationAction and redirection.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/bundle-sidecar.ps1`:
- Around line 35-40: The cargo build invocations (the native calls in the
branches comparing $TargetTriple and $HostTriple) can fail silently in
PowerShell 5.1; after each cargo build call (both the platform-specific branch
and the else branch around the lines invoking cargo build with `@CargoFlags` and
--target $TargetTriple), check $LASTEXITCODE (or the ExitCode from
Start-Process) and if non-zero immediately fail (e.g., throw or exit 1) with a
clear message including the exit code so the script doesn’t continue to
Copy-Item with a missing/stale spacebot.exe.

In `@src/agent/compactor.rs`:
- Around line 226-247: Branch compaction uses drain(..remove_count) without the
ToolResult-boundary protection found in compactor.rs (the while loop that
advances remove_count when encountering User messages containing
UserContent::ToolResult), which can orphan ToolResult blocks; refactor by
extracting that boundary logic into a shared function (e.g.,
calculate_safe_remove_count(total: usize, fraction: f32, hist: &History) ->
usize) and call it from both compactor.rs (replacing the existing remove_count
calculation and while loop) and Branch::compact_history() (replacing the direct
drain(..remove_count)), ensuring the helper checks hist.get(index) for
Message::User with content items matching UserContent::ToolResult and advances
the count accordingly.

In `@src/config/types.rs`:
- Line 845: Default-enabling the evaluate_enabled flag exposes JS execution by
default; change the default value of the evaluate_enabled field back to false in
the type's Default/constructor so browser-driven JS evaluation is opt-in, update
any related comments/docs to note it must be explicitly enabled, and run/update
tests that assert default settings; specifically modify the evaluate_enabled
default and ensure the browser evaluation code path (the JS/evaluate flow)
continues to check evaluate_enabled before executing.

In `@src/tools/browser.rs`:
- Around line 2057-2067: The current code in src/tools/browser.rs always treats
any error from browser.close().await (in the block handling Some(mut browser))
as non-fatal; change it to only suppress known terminal/disconnect errors (e.g.
oneshot canceled / WebSocket connection closed) by matching the error
kind/string/inner type from the result of browser.close().await and logging
those as warnings, but for any other unexpected error propagate it (return Err
or propagate the error out of the function) so the caller does not receive a
false "Browser closed" success; keep the existing tracing::warn for the known
cases and ensure the code path that produces the success response ("Browser
closed") is only reached when close either succeeded or matched a known terminal
disconnect.

---

Nitpick comments:
In `@interface/src/components/ModelSelect.tsx`:
- Around line 102-105: Hoist the providerRank helper out of the React component
by moving its definition to module scope (using the existing module-level
providerOrder constant) so it no longer gets recreated on each render; locate
providerRank in ModelSelect.tsx, cut the function from inside the component body
and paste it above the component definition (keeping the same implementation
that returns Number.MAX_SAFE_INTEGER when index === -1), and ensure any
references to providerRank inside the component remain unchanged.
- Around line 313-316: The render loop is doing flatList.indexOf(model) causing
O(n²) behavior; create a lookup map once (e.g., build a flatIndexMap keyed by
model id or a stable identifier from flatList) and use that map inside the
grouped[prov].map to compute flatIdx instead of indexOf; update references to
flatIdx, highlightIndex, setHighlightIndex, model.id and value to use the map
lookup and ensure the map is rebuilt only when flatList changes (memoize or
compute outside the render loop).
- Around line 198-214: The ArrowUp handler should mirror ArrowDown by opening
the dropdown when it's closed; inside the ArrowUp branch (where e.key ===
"ArrowUp") add the same check used by ArrowDown: if (!open) { setOpen(true);
setFilter(value); } and then adjust the setHighlightIndex logic to handle the
newly opened state (e.g., initialize to flatList.length - 1 so the last item is
highlighted) while preserving the existing decrement/wrap behavior; update
references to setOpen, setFilter, setHighlightIndex, flatList, and value in the
ArrowUp block accordingly.

In `@scripts/bundle-sidecar.ps1`:
- Line 33: Replace the direct console writes using Write-Host with
Write-Information so messages participate in PowerShell's information stream;
update the call shown (the literal line containing Write-Host "Building spacebot
($BuildMode) for $TargetTriple...") and the other occurrences around lines 51-52
to use Write-Information "Building spacebot ($BuildMode) for $TargetTriple..."
-InformationAction Continue (preserving the original message text and any
variable interpolation) so logs can be captured/controlled by InformationAction
and redirection.

In `@src/llm/anthropic/params.rs`:
- Around line 100-106: Extract the inline duration into a named constant for
clarity and reuse: define a const like ANTHROPIC_REQUEST_TIMEOUT (type
std::time::Duration) and replace the inline std::time::Duration::from_secs(10 *
60) in the builder.timeout(...) call (the http_client.post(&url).timeout(...)
chain) with that constant; keep the existing comment and header usage around the
builder variable unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9c545778-e40b-4860-aea5-141a799abdfa

📥 Commits

Reviewing files that changed from the base of the PR and between 6f81f39 and 248967e.

⛔ Files ignored due to path filters (50)

desktop/src-tauri/gen/schemas/windows-schema.json is excluded by !**/gen/**, !**/*.json, !**/gen/**
desktop/src-tauri/icons/64x64.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/Square107x107Logo.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/Square142x142Logo.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/Square150x150Logo.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/Square284x284Logo.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/Square30x30Logo.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/Square310x310Logo.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/Square44x44Logo.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/Square71x71Logo.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/Square89x89Logo.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/StoreLogo.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/android/mipmap-anydpi-v26/ic_launcher.xml is excluded by !**/*.xml
desktop/src-tauri/icons/android/mipmap-hdpi/ic_launcher.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/android/mipmap-hdpi/ic_launcher_foreground.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/android/mipmap-hdpi/ic_launcher_round.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/android/mipmap-mdpi/ic_launcher.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/android/mipmap-mdpi/ic_launcher_foreground.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/android/mipmap-mdpi/ic_launcher_round.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/android/mipmap-xhdpi/ic_launcher.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/android/mipmap-xhdpi/ic_launcher_foreground.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/android/mipmap-xhdpi/ic_launcher_round.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/android/mipmap-xxhdpi/ic_launcher.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/android/mipmap-xxhdpi/ic_launcher_foreground.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/android/mipmap-xxhdpi/ic_launcher_round.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/android/mipmap-xxxhdpi/ic_launcher.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/android/mipmap-xxxhdpi/ic_launcher_foreground.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/android/mipmap-xxxhdpi/ic_launcher_round.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/android/values/ic_launcher_background.xml is excluded by !**/*.xml
desktop/src-tauri/icons/icon.ico is excluded by !**/*.ico, !**/*.ico
desktop/src-tauri/icons/icon.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/ios/AppIcon-20x20@1x.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/ios/AppIcon-20x20@2x-1.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/ios/AppIcon-20x20@2x.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/ios/AppIcon-20x20@3x.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/ios/AppIcon-29x29@1x.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/ios/AppIcon-29x29@2x-1.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/ios/AppIcon-29x29@2x.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/ios/AppIcon-29x29@3x.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/ios/AppIcon-40x40@1x.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/ios/AppIcon-40x40@2x-1.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/ios/AppIcon-40x40@2x.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/ios/AppIcon-40x40@3x.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/ios/AppIcon-512@2x.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/ios/AppIcon-60x60@2x.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/ios/AppIcon-60x60@3x.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/ios/AppIcon-76x76@1x.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/ios/AppIcon-76x76@2x.png is excluded by !**/*.png, !**/*.png
desktop/src-tauri/icons/ios/AppIcon-83.5x83.5@2x.png is excluded by !**/*.png, !**/*.png
desktop/tauri.windows.conf.json is excluded by !**/*.json

📒 Files selected for processing (8)

interface/src/components/ModelSelect.tsx
scripts/bundle-sidecar.ps1
src/agent/compactor.rs
src/agent/worker.rs
src/api/models.rs
src/config/types.rs
src/llm/anthropic/params.rs
src/tools/browser.rs

coderabbitai · 2026-03-23T04:21:11Z

scripts/bundle-sidecar.ps1

+if ($TargetTriple -ne $HostTriple) {
+    cargo build @CargoFlags --target $TargetTriple --manifest-path "$RepoRoot\Cargo.toml"
+    $SrcBin = Join-Path $RepoRoot "target\$TargetTriple\$BuildMode\spacebot.exe"
+} else {
+    cargo build @CargoFlags --manifest-path "$RepoRoot\Cargo.toml"
+    $SrcBin = Join-Path $RepoRoot "target\$BuildMode\spacebot.exe"


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify current script lacks explicit native-exit guards and source existence check. rg -n -C2 'cargo build|LASTEXITCODE|PSNativeCommandUseErrorActionPreference|Test-Path \$SrcBin|Copy-Item \$SrcBin' scripts/bundle-sidecar.ps1

Repository: spacedriveapp/spacebot

Length of output: 636

🏁 Script executed:

head -n 35 scripts/bundle-sidecar.ps1

Repository: spacedriveapp/spacebot

Length of output: 1059

🏁 Script executed:

#!/bin/bash # Verify PowerShell behavior: does $ErrorActionPreference = "Stop" catch native command non-zero exit? # Key: native commands in PS 5.1 and earlier do NOT inherently throw on non-zero exit with ErrorActionPreference alone rg -n '\$ErrorActionPreference' scripts/bundle-sidecar.ps1

Repository: spacedriveapp/spacebot

Length of output: 100

🏁 Script executed:

# Check for PowerShell version requirements or constraints rg -i 'powershell|pwsh|version|requires' scripts/bundle-sidecar.ps1 | head -20

Repository: spacedriveapp/spacebot

Length of output: 48

🏁 Script executed:

# Check for any CI/build config that specifies PowerShell version fd -e yml -e yaml -e json | xargs rg -l 'powershell|pwsh' | head -5

Repository: spacedriveapp/spacebot

Length of output: 97

🏁 Script executed:

# Check if tauri config references this script or specifies environment cat desktop/tauri.windows.conf.json | jq '.build // .bundle // empty' 2>/dev/null || cat desktop/tauri.windows.conf.json | head -40

Repository: spacedriveapp/spacebot

Length of output: 407

🏁 Script executed:

# Search for any other safety checks in build/CI pipeline fd -e yml -e yaml . .github/ .gitlab/ azure-pipelines* 2>/dev/null | head -3

Repository: spacedriveapp/spacebot

Length of output: 155

Fail fast when cargo build returns non-zero exit code.

The script invokes cargo build as a native command via Windows PowerShell 5.1 (called as powershell in Tauri config). In PowerShell 5.1, $ErrorActionPreference = "Stop" does not catch non-zero exits from native commands, so a failed build will silently continue to Copy-Item, risking a stale or missing binary being shipped.

🔧 Proposed fix

$ErrorActionPreference = "Stop" +$PSNativeCommandUseErrorActionPreference = $true @@ if ($TargetTriple -ne $HostTriple) { cargo build `@CargoFlags` --target $TargetTriple --manifest-path "$RepoRoot\Cargo.toml" + if ($LASTEXITCODE -ne 0) { throw "cargo build failed with exit code $LASTEXITCODE" } $SrcBin = Join-Path $RepoRoot "target\$TargetTriple\$BuildMode\spacebot.exe" } else { cargo build `@CargoFlags` --manifest-path "$RepoRoot\Cargo.toml" + if ($LASTEXITCODE -ne 0) { throw "cargo build failed with exit code $LASTEXITCODE" } $SrcBin = Join-Path $RepoRoot "target\$BuildMode\spacebot.exe" } @@ +if (-not (Test-Path $SrcBin)) { + throw "Built binary not found: $SrcBin" +} Copy-Item $SrcBin $DestBin -Force

Also applies to: 50–50

🧰 Tools

🪛 PSScriptAnalyzer (1.24.0)

[warning] Missing BOM encoding for non-ASCII encoded file 'bundle-sidecar.ps1'

(PSUseBOMForUnicodeEncodedFile)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/bundle-sidecar.ps1` around lines 35 - 40, The cargo build invocations (the native calls in the branches comparing $TargetTriple and $HostTriple) can fail silently in PowerShell 5.1; after each cargo build call (both the platform-specific branch and the else branch around the lines invoking cargo build with `@CargoFlags` and --target $TargetTriple), check $LASTEXITCODE (or the ExitCode from Start-Process) and if non-zero immediately fail (e.g., throw or exit 1) with a clear message including the exit code so the script doesn’t continue to Copy-Item with a missing/stale spacebot.exe.

coderabbitai · 2026-03-23T04:21:11Z

src/agent/compactor.rs

+        let mut remove_count = ((total as f32 * fraction) as usize)
            .max(1)
            .min(total.saturating_sub(2));
        if remove_count == 0 {
            return Ok(0);
        }
+
+        // Advance past User messages containing ToolResult blocks so we never
+        // orphan tool_results whose matching tool_use was in a removed message.
+        while remove_count < total.saturating_sub(2) {
+            let has_tool_result = matches!(
+                hist.get(remove_count),
+                Some(Message::User { content })
+                    if content.iter().any(|item|
+                        matches!(item, UserContent::ToolResult(_)))
+            );
+            if has_tool_result {
+                remove_count += 1;
+            } else {
+                break;
+            }
+        }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Check if branch.rs compact_history handles ToolResult boundaries # Expected: Either branch.rs has similar skip logic, or branches never contain ToolResult messages # Search for compact_history in branch.rs and show surrounding context rg -n -A 20 'fn compact_history' src/agent/branch.rs # Check if branches ever receive ToolResult content rg -n 'ToolResult' src/agent/branch.rs

Repository: spacedriveapp/spacebot

Length of output: 740

Branch compaction should apply the same ToolResult boundary logic as compactor — extract to shared utility.

Branch.rs compact_history() (line 287-303) uses .drain(..remove_count) without the ToolResult-boundary skip logic from compactor.rs (lines 226-247). While branches currently don't receive ToolResult messages, the inconsistency is a latent bug: if branch conversations ever include tool results (e.g., from enhanced curation flows), the drain could orphan ToolResult blocks and trigger Anthropic API rejection.

Recommendation: Either add the defensive skip logic to branch compaction, or extract the boundary logic into a shared utility function (e.g., calculate_safe_remove_count()) to prevent future divergence.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/agent/compactor.rs` around lines 226 - 247, Branch compaction uses drain(..remove_count) without the ToolResult-boundary protection found in compactor.rs (the while loop that advances remove_count when encountering User messages containing UserContent::ToolResult), which can orphan ToolResult blocks; refactor by extracting that boundary logic into a shared function (e.g., calculate_safe_remove_count(total: usize, fraction: f32, hist: &History) -> usize) and call it from both compactor.rs (replacing the existing remove_count calculation and while loop) and Branch::compact_history() (replacing the direct drain(..remove_count)), ensuring the helper checks hist.get(index) for Message::User with content items matching UserContent::ToolResult and advances the count accordingly.

coderabbitai · 2026-03-23T04:21:11Z

src/config/types.rs

            enabled: true,
            headless: true,
-            evaluate_enabled: false,
+            evaluate_enabled: true,


⚠️ Potential issue | 🟠 Major

Default-enabling browser_evaluate weakens the security boundary.

Line 845 flips evaluate_enabled to true, which makes arbitrary JS execution available by default (see src/tools/browser.rs, Line 1753). This changes a high-risk capability from opt-in to opt-out and increases prompt-injection blast radius.

🔧 Suggested fix

- evaluate_enabled: true, + evaluate_enabled: false,

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

evaluate_enabled: true,

evaluate_enabled: false,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/config/types.rs` at line 845, Default-enabling the evaluate_enabled flag exposes JS execution by default; change the default value of the evaluate_enabled field back to false in the type's Default/constructor so browser-driven JS evaluation is opt-in, update any related comments/docs to note it must be explicitly enabled, and run/update tests that assert default settings; specifically modify the evaluate_enabled default and ensure the browser evaluation code path (the JS/evaluate flow) continues to check evaluate_enabled before executing.

coderabbitai · 2026-03-23T04:21:11Z

src/tools/browser.rs

+                if let Some(mut browser) = browser {
+                    if let Err(error) = browser.close().await {
+                        // The browser connection may already be dead (e.g. "oneshot canceled"
+                        // from a dropped WebSocket). That's fine — the browser is effectively
+                        // closed. Log it but don't fail the tool call.
+                        tracing::warn!(
+                            policy = "close_browser",
+                            %error,
+                            "browser.close() failed, treating as already closed"
+                        );
+                    }


⚠️ Potential issue | 🟠 Major

Don’t treat every browser.close() failure as success.

Lines 2058-2067 currently suppress all close failures and still return "Browser closed". That can hide real shutdown failures and leave a running/orphaned browser while reporting success.

Please only suppress known terminal/disconnect errors and propagate unexpected close failures.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/tools/browser.rs` around lines 2057 - 2067, The current code in src/tools/browser.rs always treats any error from browser.close().await (in the block handling Some(mut browser)) as non-fatal; change it to only suppress known terminal/disconnect errors (e.g. oneshot canceled / WebSocket connection closed) by matching the error kind/string/inner type from the result of browser.close().await and logging those as warnings, but for any other unexpected error propagate it (return Err or propagate the error out of the function) so the caller does not receive a false "Browser closed" success; keep the existing tracing::warn for the known cases and ensure the code path that produces the success response ("Browser closed") is only reached when close either succeeded or matched a known terminal disconnect.

Workers spawning large LLM requests (e.g. writing a full PRD) were getting stuck for 30+ minutes because: 1. The global reqwest client timeout of 120s was too short for large Anthropic completions with extended thinking — requests timed out before the API could finish generating. 2. The retry cascade (3 model-level retries × 5 worker-level retries) meant each timeout failure took ~30 minutes before the worker finally gave up, only to be respawned with the same result. Changes: - Add 10-minute per-request timeout on Anthropic API calls, overriding the 120s global client timeout. Matches the scale of the streaming path (30min). - Reduce MAX_TRANSIENT_RETRIES from 5 to 3 (still 9 total attempts with model-level retries) to fail faster on sustained API issues. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

TheDarkSkyXD force-pushed the fix/worker-timeout-retry-loop branch from 248967e to 5a5d5b8 Compare March 26, 2026 00:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: prevent workers from getting stuck in timeout-retry loops#477

fix: prevent workers from getting stuck in timeout-retry loops#477
TheDarkSkyXD wants to merge 1 commit intospacedriveapp:mainfrom
TheDarkSkyXD:fix/worker-timeout-retry-loop

TheDarkSkyXD commented Mar 23, 2026

Uh oh!

coderabbitai bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 23, 2026

Uh oh!

coderabbitai bot Mar 23, 2026

Uh oh!

coderabbitai bot Mar 23, 2026

Uh oh!

coderabbitai bot Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

TheDarkSkyXD commented Mar 23, 2026

Summary

Context

Test plan

Uh oh!

coderabbitai bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Mar 23, 2026 •

edited

Loading