Add EFT-3295 Tentacle script abandon design doc#1226
Conversation
Design proposing new AbandonScript verb on IScriptServiceV2 to release the script-isolation mutex when Cancel can't propagate. Contract locked with parallel server-side session via the EFT-3295 Linear thread. Implementation choice (Option 1 async vs Option 2 surgical ManualResetEventSlim swap) preserved for external review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds question 3 to "Open questions for external reviewer" asking whether the Tentacle-side feature flag is pulling weight given the server-side toggle already governs whether abandon dispatches at all. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Section 4.3 now lists ScriptBuilder, TestExecuteShellScriptCommandBuilder, TentacleServiceDecoratorBuilder, and Wait.For by name and points at ClientScriptExecutionIsolationMutex.cs as the exemplar. File-signal pattern (.CreateFile / .WaitForFileToExist) replaces sleep-based timing in stuck-script tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Decision: replace ExecuteCommand with ExecuteCommandAsync outright. All ~20 callers across production code, Kubernetes integration tests, and Tentacle integration test scaffolding migrate to await. Section 2 rewritten to commit to the async approach; the sync ManualResetEventSlim variant moves to the rejected-alternatives list. Section 4 timing and thread-leak tests consolidated. Open questions trimmed to two (workspace cleanup, Tentacle-side flag). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| - `source/Octopus.Tentacle.Tests.Integration/Util/SilentProcessRunnerFixture.cs` | ||
| - `source/Octopus.Tentacle.Tests.Integration/Support/TentacleFetchers/LinuxTentacleFetcher.cs` | ||
|
|
||
| **What happens to stdout/stderr after abandon.** The OS process is still running and may still write to its stdout/stderr pipes. The `OutputDataReceived` handler we registered is still wired. On the abandon catch, we let the Process object's handles release through normal GC — the OS process eventually blocks on its next stdout write when the pipe buffer fills (~64KB), which is acceptable per the ticket's "we are not interfering" framing. Tentacle's resources are bounded. |
There was a problem hiding this comment.
Claude is this up to date? If we trigger the abandon token the very next thing looks like it call methods that stops process writing to its pipes
There was a problem hiding this comment.
Caught. Spec was lying. Fixed in the latest commit. The outer using (var process = new Process()) disposes the Process on method unwind, which closes our pipe handles. OS process may get EPIPE on its next stdout/stderr write. Consistent with the ticket because we're closing our own handles, not killing the process. Updated paragraph explains it honestly.
|
|
||
| **Normal-path timing regression check.** Run a 100-iteration benchmark of normal short-script execution (`Write-Host "x"`); compare median wall-clock time vs. a baseline build without the changes. **Verify:** median delta within margin of error. The async swap should not measurably slow normal script execution. | ||
|
|
||
| ### 4.4 Feature flag verification |
There was a problem hiding this comment.
@LukeButters we can't really effectively feature flag this on tentacle can we? Our only option would be to create a script service v2 which we said isn't worth the overhead of supporting that version forever in the wild right?
There was a problem hiding this comment.
Agreed. Tentacle-side flag dropped in the latest commit. Capability advertisement is binary on build version. Server-side AbandonTentacleScriptOnCancellationTimeoutFeatureToggle is the only off-switch. @LukeButters if you disagree, this is the load-bearing comment to push back on.
|
|
||
| ## Section 4 — Automated test strategy | ||
|
|
||
| ### 4.1 `SilentProcessRunner` unit tests (both options) |
There was a problem hiding this comment.
We have selected an option now. So check the whole spec for optionality like this where its not needed. Also make things a bit more succinct
There was a problem hiding this comment.
Cleaned up. Async is locked, "(both options)" headers and Option 1/2 comparison language removed throughout. Section 4.4 (Tentacle-flag tests) deleted, Section 5 M5 rewritten around the server-side flag, ~30 lines trimmed in the latest commit.
PR comments addressed: - Tentacle-side feature flag dropped entirely. Can't be cleanly toggled at runtime without versioning the contract. Server-side toggle is the only off-switch. - stdout/stderr-after-abandon paragraph rewritten honestly: the outer using (var process = new Process()) disposes Process on method unwind, closing our pipe handles. The OS process may get EPIPE on next write; consistent with the ticket's "we're not killing it" framing because we're closing our handles, not the process. - "(both options)" header noise removed from Section 4.1/4.2 and Section 3. Section 4.4 (Tentacle-side flag tests) deleted. - Coordination section reflects the single server-side flag, plus the V1+V2 execution-pipeline scope and polling-loop trigger updates the server-side session posted on Linear (no Tentacle-side impact). - M5 manual scenario rewritten around the server-side flag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| - **Manual `Thread` instead of `Task`.** Same leak problem, just trades threadpool for kernel thread handles + stack memory. | ||
| - **`Thread.Abort` / `Thread.Interrupt` / `TerminateThread` P/Invoke.** No safe managed mechanism to release a thread parked in unmanaged code. `TerminateThread` doesn't unwind stack or release locks; can corrupt Tentacle's own state. | ||
| - **Out-of-process script worker.** Cleanly isolates the stuck-process problem from Tentacle, but is a massive refactor far outside EFT-3295's scope. Worth a separate proposal someday. | ||
| - **Sync cancellable wait via `ManualResetEventSlim.Wait()`** (the earlier "Option 2" we held open for external input). Replaces only the blocking primitive inside `SilentProcessRunner`, leaves everything else synchronous. Smaller diff, but preserves a parked thread per running script in the normal case (same cost as today) and doesn't move the codebase toward async. Rejected in favour of the async approach below — direction matters, not just diff size. |
There was a problem hiding this comment.
And tentacle is very well tested so we believe we can make this change safely
There was a problem hiding this comment.
Good point. Worked it into the rejected bullet: "Rejected in favour of the async approach below. Tentacle's existing test coverage gives us confidence the wider async migration is safe to ship, so the smaller-diff defensiveness isn't compelling."
|
|
||
| **Why a new verb (not a "force" flag on Cancel).** Different semantics: Cancel = "try to stop the OS process gracefully". Abandon = "give up tracking; release the mutex; the OS process may still be running". Two verbs map cleanly to ProcessExecution's two-step escalation (cancel first, abandon if cancel doesn't propagate). | ||
|
|
||
| ## Section 2 — Mutex release mechanics (locked: async) |
There was a problem hiding this comment.
(locked: async) still refers to earlier optionality. You can remove all of this and just leave the considered section as the only mention of optionality
There was a problem hiding this comment.
Done. Dropped (locked: async) from Section 2 header and (locked) from Section 1 header for consistency. Status line at the top simplified too. The rejected-alternatives section is now the only place that mentions past optionality.
CompleteScript reads stateStore exit code. If AbandonedExitCode, the delete is wrapped in try/catch and leaks the directory with a Warn log. Any other exit code follows existing behaviour (exceptions propagate), so bugs that leak handles on normal completion paths can't hide under a blanket try/catch. No janitor; OS-level host state is the customer's responsibility per the ticket. Closes the last open question. Spec status: ready for implementation planning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
catch, note test coverage as confidence basis - Section headers no longer flag past optionality. Rejected alternatives section is where the option history lives. - Abandon catch now calls SafelyCancelRead before returning, mirroring the non-abandon cleanup path so async readers don't keep firing during method unwind. - Sync-MRES rejected bullet explicitly cites Tentacle's existing test coverage as why the smaller-diff defensiveness isn't compelling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| - `TentacleServiceDecoratorBuilder.RecordMethodUsages(...)` decorates the script service so the test can assert how many times each method was called. Use this to verify capability negotiation and call counts for the new `AbandonScript` verb. | ||
| - `Wait.For(condition, timeout, onFail, ct)` is the event-driven polling helper. Always preferred over `Task.Delay` in test bodies. | ||
|
|
||
| **Pattern to follow:** mirror `ClientScriptExecutionIsolationMutex.cs`. Stuck-script tests should use `ScriptBuilder.WaitForFileToExist(...)` as the "kernel-blocked" simulant rather than `sleep 600`. The file-wait is event-driven and the test can release it on demand by creating the file. For the unkillable variant, combine the file-wait pattern with the `Tentacle.Debug.DisableProcessKill` flag described in the manual test setup so `Hitman` becomes a no-op for the test's duration. |
There was a problem hiding this comment.
@LukeButters I can't think of a way to simulate being blocked at the kernel level than actually creating an alternate path in hitman that doesn't do the killing... lmk if you can think of anything better. I dont love this
Sixteen-task plan covering contract additions, async migration of SilentProcessRunner.ExecuteCommand (~20 callers), RunningScript and ScriptServiceV2 abandon wiring, targeted CompleteScript best-effort policy, capability advertisement, and end-to-end integration tests that use the existing ClientScriptExecutionIsolationMutex pattern with TentacleDebugDisableProcessKill for the stuck-process simulant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Question note Jim left in the code while reading; subagent implementing Task 11/12 should either answer this or remove it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…isable flag - Rename ExecuteCommand → ExecuteCommandAsync (both overloads), return Task<int> - Add abandon CancellationToken parameter; await process.WaitForExitAsync(abandon) - On OperationCanceledException from abandon (process still running): log honest message, SafelyCancelRead x2, set running=false, return AbandonedExitCode (-48) - Add net48 polyfill WaitForExitAsyncNetFramework via TaskCompletionSource+Exited event - Wire TentacleDebugDisableProcessKill env var into Hitman.TryKillProcessAndChildrenRecursively Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ons async Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add `abandonToken` field and constructor parameter to `RunningScript` (primary constructor); secondary constructor defaults to `CancellationToken.None` for backwards-compatible callers (ScriptService V1, existing fixture SetUp). - Replace synchronous `RunScript` with async `RunScriptAsync` that awaits `SilentProcessRunner.ExecuteCommandAsync` and forwards `abandonToken`. - Update `Execute()` and `RunPowershellScriptWithMonitoring` to await the new method. - Update `ScriptServiceV2.LaunchShell` to pass `CancellationToken.None` as placeholder until Task 11 wires the real abandon token. - Add `Execute_WhenAbandonTokenFires_ReturnsAbandonedExitCode` to `RunningScriptFixture` using a pidfile pattern for deterministic process-start detection before firing the abandon token. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…test now passes Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add abandonTokenSource to RunningScriptWrapper; expose AbandonToken getter and Abandon() method; dispose both CTS in Dispose(). - Replace NotImplementedException stub with real AbandonScriptAsync: fires runningScript.Abandon() then delegates to GetResponse, which returns UnknownScriptExitCode for unknown tickets and the real exit code for already-completed scripts. - Update LaunchShell signature to accept abandonToken and thread it through to new RunningScript(...) instead of CancellationToken.None. - Remove Jim's exploratory comment; replace with a brief, accurate explanation of why the state-check guard exists. - Add three service-layer tests covering unknown ticket, running script abandon (fires token → AbandonedExitCode), and already-completed script (real exit code preserved). - Fix pre-existing compile errors in KubernetesDirectoryInformationProviderFixture and LinuxTestUserPrincipal left by the Task 8 async migration (ExecuteCommand → ExecuteCommandAsync). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CompleteScriptAsync now reads the state store to detect abandon before deleting the workspace. On the abandoned path it swallows IOException with a Warn log; on normal-completion it propagates unchanged so bugs that leak file handles still surface. Two new fixture tests cover both branches. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Plumbing gap surfaced during Task 14 — the Halibut client async contract needs the new method to mirror IScriptServiceV2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds ClientScriptExecutionAbandon which starts a FullIsolation script, calls AbandonScriptAsync via raw IAsyncClientScriptServiceV2 client (TentacleDebugDisableProcessKill=1 so the process survives), then asserts a second FullIsolation script on the same mutex name can start and complete — proving the mutex is released on abandon. Supporting changes: - ClientAndTentacle.CreateScriptServiceV2Client() helper for raw V2 RPC access - FuncDecoratingScriptServiceV2 and NoOPClientScriptService now implement AbandonScriptAsync (required by IAsyncClientScriptServiceV2 interface) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds AbandonScript_MultiLevelDeepHang_StillReleasesMutex which chains outer shell → child shell → grandchild polling a release file. With TentacleDebugDisableProcessKill=1 the stuck process is not killed, but abandon still returns AbandonedExitCode and releases the isolation mutex so a second FullIsolation script with the same mutex name can start. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PowerShellPrerequisite (Octopus.Manager.Tentacle WPF installer) called the deleted sync SilentProcessRunnerExtended.ExecuteCommand. Migrated to ExecuteCommandAsync via GetAwaiter().GetResult() at the sync boundary; the installer runs on a ThreadPool worker with no synchronisation context so the bridge is deadlock-safe. SetupHelpers.GetTentacleImageAndTag was assigning a Task<string?> to a string after Task 8's async migration of DockerImageLoader. Made the helper async and awaited at both call sites (KubernetesClusterOneTimeSetUp, KubernetesClientCompatibilityTests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two regressions in our async migration that CI surfaced: 1. process.Close() in DoOurBestToCleanUp races with the Exited event firing. With the new WaitForExitAsync path that depends on Exited to complete the TaskCompletionSource, closing the handle immediately after Kill can win the race and the await hangs forever — exactly the symptom of ScriptServiceFixture.ShouldCancelPing timing out at 10 minutes. Removed the Close() call. The Process gets disposed by the outer `using (var process = new Process())` when ExecuteCommandAsync returns; that's the safe time. Luke's original Close() was needed to unblock the OLD sync WaitForExit() from pipe-drain hangs when a grandchild held redirected pipes. With WaitForExitAsync we don't depend on pipe drain anymore, and SafelyWaitForAllOutput's existing 5-second timeout bounds the grandchild-pipe case. 2. Two CapabilitiesServiceV2Test assertions check the exact count of advertised capabilities. Now that AbandonScriptV2 is advertised on Latest tentacles, the expected count is +1 for the version == null (Latest) variants. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ning
The spec promises that AbandonScriptAsync returns ScriptStatusResponseV2(Complete, AbandonedExitCode) immediately. Before this fix
it fired the abandon token and returned the runningScript's current
state synchronously — which is still Running because Execute() hasn't
yet processed the abandon. Callers asserting on the response saw
stale (Running, 0) values.
Added a short bounded wait (up to 5s, polling every 25ms) for State
to transition to Complete. In practice this completes in
single-digit ms because the abandon path is just propagating an
exception through async unwind.
The single-level abandon test was previously passing only because it
never asserted on the response. Added the same Complete/AbandonedExitCode
assertion there so we don't regress this contract silently again.
The multi-level test had a separate cosmetic issue: the inner bash
command didn't quote the path, so on macOS where temp paths contain
spaces ("Application Support") the [ -f ] test produced a "too many
arguments" error and the loop exited with code 0. Added double-quote
escaping around the path inside the nested bash command.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. I'm Claude, leaving this review on Jim's behalf. 🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
| try | ||
| { | ||
| SilentProcessRunnerExtended.ExecuteCommand( | ||
| // Sync boundary: prerequisite check runs from the WPF installer |
There was a problem hiding this comment.
What does this comment mean its illegible
jimmyp
left a comment
There was a problem hiding this comment.
Need to address these comments
| try | ||
| { | ||
| SilentProcessRunnerExtended.ExecuteCommand( | ||
| // Sync boundary: prerequisite check runs from the WPF installer |
| var stdOut = new List<string>(); | ||
| var stdErr = new List<string>(); | ||
| var exitCode = silentProcessRunner.ExecuteCommand("du", $"-s -B 1 {directoryPath}", "/", stdOut.Add, stdErr.Add); | ||
| // Sync boundary: called from IMemoryCache.GetOrCreate factory which is synchronous. |
There was a problem hiding this comment.
Brainstorm with me what we should do in each instance of these
|
|
||
| //non-kubernetes agent tentacles only support the standard script services | ||
| return new CapabilitiesResponseV2(new List<string> { nameof(IScriptService), nameof(IFileTransferService), nameof(IScriptServiceV2) }); | ||
| return new CapabilitiesResponseV2(new List<string> { nameof(IScriptService), nameof(IFileTransferService), nameof(IScriptServiceV2), "AbandonScriptV2" }); |
There was a problem hiding this comment.
Use the nameof operator on the method we added
| return bashScript.ToString(); | ||
| } | ||
|
|
||
| public ScriptBuilder AppendRaw(string bash, string windows) |
There was a problem hiding this comment.
Why did we have to create this why couldnt we use ther existing primitives
| exitCode = workspace.ShouldMonitorPowerShellStartup() | ||
| ? await RunPowershellScriptWithMonitoring(shellPath, writer, runningScriptToken) | ||
| : RunScript(shellPath, writer, runningScriptToken); | ||
| : await RunScriptAsync(shellPath, writer, runningScriptToken, abandonToken); |
There was a problem hiding this comment.
Where we catch the canceled exception below, do we want to differentiate abandoned scripts and log appropriately?
| return new LegacyTentacleClientBuilder(halibutRuntime, ServiceEndPoint); | ||
| } | ||
|
|
||
| public IAsyncClientScriptServiceV2 CreateScriptServiceV2Client() |
| var exitCode = await task; | ||
| sw.Stop(); | ||
|
|
||
| sw.Elapsed.Should().BeLessThan(TimeSpan.FromSeconds(2), "abandon should return promptly"); |
There was a problem hiding this comment.
Why do we want to assert on this?
| } | ||
|
|
||
| // Latest tentacles also advertise AbandonScriptV2 (added in EFT-3295). Older versioned builds do not. | ||
| if (version == null) |
|
|
||
| // Load-bearing: second FullIsolation script with the same mutex name must now be able to start, | ||
| // proving the mutex was released by the abandon | ||
| var secondStartFile = Path.Combine(clientTentacle.TemporaryDirectory.DirectoryPath, "second-start"); |
| var startFile = Path.Combine(clientTentacle.TemporaryDirectory.DirectoryPath, "ml-start"); | ||
| var releaseFile = Path.Combine(clientTentacle.TemporaryDirectory.DirectoryPath, "ml-release"); | ||
|
|
||
| // Multi-level chain: outer shell (bootstrap) → child shell → grandchild polls for the release file. |
There was a problem hiding this comment.
Why do we need to test this directly, does it matter if its multiple levels deep where the stuck thing occurs
GetDriveBytesUsingDu was blocking on ExecuteCommandAsync via .GetAwaiter().GetResult() because GetPathUsedBytes was called from a sync IMemoryCache.GetOrCreate factory. The chain is now fully async: GetDriveBytesUsingDuAsync → GetPathUsedBytesAsync (GetOrCreateAsync) → GetStorageInformationAsync → awaited by CreateScriptContainer EnsureDiskHasEnoughFreeSpace still blocks (it overrides a sync IOctopusFileSystem method called from config writes and script workspace readiness checks), but the sync boundary is now at the override where the constraint is obvious, not buried inside the du runner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AbstractCommand gets a StartAsync() virtual hook (defaults to calling
Start() so all existing sync commands need no changes). ServiceCommand
overrides StartAsync() instead of Start(), allowing the whole chain
below it to be properly async:
ServiceCommand.StartAsync()
-> IServiceConfigurator.Configure*Async()
-> LinuxServiceConfigurator / WindowsServiceConfigurator (async)
-> SystemCtlHelper.RunServiceCommandAsync()
-> await ExecuteCommandAsync()
Six .GetAwaiter().GetResult() calls removed (2 in SystemCtlHelper,
3 in LinuxServiceConfigurator, 1 in WindowsServiceConfigurator).
Replaced by one sync boundary in AbstractCommand's dispatcher where
the CLI host (OctopusProgram) requires a sync Action<ICommandRuntime>.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds StartAsync(string[], ICommandRuntime, OptionSet) to ICommand and ICommandHost.RunAsync(Func<ICommandRuntime, Task>, Action) to all four host implementations. OctopusProgram.Run() delegates to RunAsync(); Program.Main becomes async Task<int>. The single remaining sync boundary is AbstractCommand.Start(string[]...) which calls StartAsync(...).GetAwaiter().GetResult() — kept for backwards compatibility with any callers that hold an ICommand and call Start() synchronously, but OctopusProgram no longer goes through it. HelpCommand migrated to override StartAsync(string[]...) instead of Start(string[]...) for consistency. WindowsServiceHost.RunAsync uses Task.Run for ServiceBase.Run because that Windows API must block a thread by design. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ICommand.void Start(string[],...) and ICommandHost.void Run(Action,...) are now deleted — no production or test code calls them. All callers have been migrated to StartAsync / RunAsync. NoninteractiveHost.ManualResetEvent removed (only served the sync Run path). Three test fixtures (ServerCommsCommandTest, CheckServicesCommand Fixture, WatchdogCommandFixture) migrated to async Task tests using StartAsync and Assert.ThrowsAsync. Two Kubernetes integration test calls also updated to await StartAsync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ICommandLineRunner.Execute → ExecuteAsync (Task<bool>). CommandLineRunner
now awaits ExecuteCommandAsync directly — no .GetAwaiter().GetResult().
Callers updated:
- ReviewAndRunScriptTabViewModel: already async Task<bool>; Rollback
becomes async Task RollbackAsync
- RunProcessDialog: ThreadPool.QueueUserWorkItem → Task.Run(async)
- SetupTentacleWizardModelBuilder test: NSubstitute mock updated
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five commands (PollCommand, RegisterMachineCommandBase, ShowConfiguration, DeregisterMachine, DeregisterWorker) used a pre-existing pattern where protected Start() wrapped a private async Task StartAsync(). Now that AbstractCommand.StartAsync(string[],...) calls the protected virtual StartAsync() directly, they override that instead — no GetAwaiter needed. Also remove OctopusProgram.Run() (dead code — Main now calls RunAsync()). Remaining GetAwaiter().GetResult() sites are all pre-existing forced sync boundaries: IWritableKeyValueStore, IMachineKeyEncryptor (Kubernetes config), IOctopusFileSystem override, and WindowsServiceHost (ServiceBase.Run must block). All have been documented in prior commits or were pre-existing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Background
EFT-3295. Tentacle script can hang inside endpoint protection (Philips' CrowdStrike + Rapid7 contention). Cancel doesn't propagate, the script-isolation mutex stays held, subsequent deployments queue forever. Customer's only recovery today is RDP-in-and-kill or reboot.
Results
Adds a design doc at
docs/superpowers/specs/2026-05-21-tentacle-script-abandon-design.mdproposing a newAbandonScriptverb onIScriptServiceV2. Tentacle releases the mutex, logs honestly that the script may still be running, frees the .NET thread, does NOT kill the OS process.Implementation approach: async. Replace
SilentProcessRunner.ExecuteCommandoutright withExecuteCommandAsync(no additive overload). All ~20 callers across production code, Kubernetes integration tests, and Tentacle integration test scaffolding migrate to await.Server-side contract is locked with the parallel session via the EFT-3295 Linear thread. Capability check up front. Single feature flag, server-side:
AbandonTentacleScriptOnCancellationTimeoutFeatureToggle. Tentacle's capability advertisement is binary on build version.Before
Status quo. Tentacle hangs indefinitely on
process.WaitForExit()when Cancel can't kill the process. Mutex stays held. Deployments queue forever until reboot.After
Server escalates to
AbandonScriptafter 2 min. Tentacle'sawait process.WaitForExitAsync(abandon)returns. The existingusingblock disposes naturally, mutex released. Tentacle writes an honest line into the script log and accepts new work. OS process is left alone per the ticket.How to review this PR
Doc-only PR. One open question remaining:
Earlier review questions resolved in the latest commit:
Tentacle-side scope only. Server-side has its own design doc in OctopusDeploy.
Quality ✔️
Design review only.
🤖 Generated with Claude Code
[JIM_BOT.EXE v2.13]