diff --git a/docs/superpowers/specs/2026-05-21-tentacle-script-abandon-design.md b/docs/superpowers/specs/2026-05-21-tentacle-script-abandon-design.md new file mode 100644 index 000000000..694e63254 --- /dev/null +++ b/docs/superpowers/specs/2026-05-21-tentacle-script-abandon-design.md @@ -0,0 +1,400 @@ +# Tentacle script abandon — design + +**Status:** Draft, ready for implementation planning. Contract aligned with the parallel server-side session. +**Ticket:** [EFT-3295](https://linear.app/octopus/issue/EFT-3295/tentacle-script-abandonment-to-release-the-mutex) +**ADR:** [ADR-042 — Defer server-task Abandoned state](https://github.com/OctopusDeploy/adr/pull/226) +**Parallel work:** Server-side (ProcessExecution layer) is being designed in a separate session and will consume the contract proposed here. + +--- + +## Problem + +When a Tentacle script is hung in a way that resists `Process.Kill` (Philips' case: PowerShell stuck inside CrowdStrike + Rapid7 fighting over the same process; kernel-level uninterruptible wait), today's flow ends with: + +- `ScriptIsolationMutex` stays held → subsequent deployments to that Tentacle queue forever. +- The .NET threadpool thread inside `RunningScript.Execute()` stays parked on `process.WaitForExit()` (synchronous). +- The customer's only recovery is RDP-in-and-kill or reboot. Not acceptable for Philips. + +Server-side will detect that cancellation hasn't propagated within its own timeout and will tell Tentacle to **abandon** the script. Tentacle releases the mutex, logs honestly, accepts new work. The runaway OS process is **not** killed — explicitly out of scope per the ticket. + +## Scope + +In scope: +- `IScriptServiceV2` only (Listening + Polling Tentacles). +- New Halibut RPC verb `AbandonScript`, new exit code `AbandonedExitCode = -48`. +- Gated by server-side feature flag (`AbandonTentacleScriptOnCancellationTimeoutFeatureToggle`) for the first release. No Tentacle-side flag — capability advertisement is binary on build version. + +Out of scope: +- SSH targets (different lock model; ticket explicitly defers). +- Kubernetes agent (`IKubernetesScriptServiceV1`): different mechanism, separate stuck-pod work already in flight (`KubernetesPendingPodWatchDog`). Server's capability negotiation handles "don't try abandon on Kubernetes targets" cleanly. +- Old `IScriptService` (V1): no signal that any active Tentacle still negotiates V1. +- Killing the runaway OS process. +- Server-task Abandoned UI state — deferred by ADR-042; task continues to surface as Cancelled. + +## Section 1 — Contract surface + +Add a method to existing `IScriptServiceV2`. Do NOT introduce V3 — the convention here is method-addition + capability negotiation. + +```csharp +// source/Octopus.Tentacle.Contracts/ScriptServiceV2/IScriptServiceV2.cs +public interface IScriptServiceV2 +{ + ScriptStatusResponseV2 StartScript(StartScriptCommandV2 command); + ScriptStatusResponseV2 GetStatus(ScriptStatusRequestV2 request); + ScriptStatusResponseV2 CancelScript(CancelScriptCommandV2 command); + ScriptStatusResponseV2 AbandonScript(AbandonScriptCommandV2 command); // NEW + void CompleteScript(CompleteScriptCommandV2 command); +} + +// NEW: source/Octopus.Tentacle.Contracts/ScriptServiceV2/AbandonScriptCommandV2.cs +public class AbandonScriptCommandV2 +{ + public AbandonScriptCommandV2(ScriptTicket ticket, long lastLogSequence) { /* … */ } + public ScriptTicket Ticket { get; } + public long LastLogSequence { get; } +} +``` + +**Capability advertisement.** Tentacle's `CapabilitiesServiceV2` advertises `AbandonScriptV2` once the build supports it. Binary on build version, no Tentacle-side toggle. Server's existing `BackwardsCompatibleAsyncCapabilitiesV2Decorator` handles "Tentacle doesn't advertise it → don't call it" for older Tentacles. Server-side `AbandonTentacleScriptOnCancellationTimeoutFeatureToggle` is the only feature-flag off-switch. + +**Why a new verb (not a "force" flag on Cancel).** Different semantics: Cancel = "try to stop the OS process gracefully". Abandon = "give up tracking; release the mutex; the OS process may still be running". Two verbs map cleanly to ProcessExecution's two-step escalation (cancel first, abandon if cancel doesn't propagate). + +## Section 2 — Mutex release mechanics + +**The core constraint.** `RunningScript.Execute()` acquires `ScriptIsolationMutex` inside a `using` block that wraps a synchronous call to `SilentProcessRunner.ExecuteCommand`. `ExecuteCommand` blocks on `process.WaitForExit()` (line 143). When `WaitForExit` never returns: +1. The mutex is welded shut (the `using`'s Dispose never runs). +2. The threadpool thread inside `Task.Run(() => Execute())` is parked forever. + +Both problems need to be solved. The mutex problem is the ticket's primary deliverable; the parked-thread problem is required so Tentacle doesn't accumulate thread leaks each time the abandon path fires. + +**Rejected alternatives** (documented for the reviewer's benefit): + +- **Orphan the Task + release mutex via external Dispose.** Releases mutex but leaks a threadpool worker per abandon. Tentacle eventually starves the threadpool. +- **Manual `Thread` instead of `Task`.** Same leak problem, just trades threadpool for kernel thread handles + stack memory. +- **`Thread.Abort` / `Thread.Interrupt` / `TerminateThread` P/Invoke.** No safe managed mechanism to release a thread parked in unmanaged code. `TerminateThread` doesn't unwind stack or release locks; can corrupt Tentacle's own state. +- **Out-of-process script worker.** Cleanly isolates the stuck-process problem from Tentacle, but is a massive refactor far outside EFT-3295's scope. Worth a separate proposal someday. +- **Sync cancellable wait via `ManualResetEventSlim.Wait()`.** Replaces only the blocking primitive inside `SilentProcessRunner`, leaves everything else synchronous. Smaller diff, but preserves a parked thread per running script in the normal case (same cost as today) and doesn't move the codebase toward async. Rejected in favour of the async approach below. Tentacle's existing test coverage gives us confidence the wider async migration is safe to ship, so the smaller-diff defensiveness isn't compelling. + +### The chosen approach: async cancellable wait + +Replace the sync `process.WaitForExit()` with `await process.WaitForExitAsync(abandon)`. **Replace `ExecuteCommand` outright; do NOT ship an additive overload.** Every caller migrates to await. + +**Verified behaviour** (.NET source, `Process.cs:1523-1594`): `WaitForExitAsync` uses a `TaskCompletionSource` driven by either the process's `Exited` event or `cancellationToken.UnsafeRegister(... TrySetCanceled ...)`. When the token fires, the awaiter completes with `OperationCanceledException` independently of whether the OS process has exited. The `WaitUntilOutputEOF` follow-up is bypassed on cancellation. **No thread is parked during the wait.** + +**Two tokens, one passed to the wait.** `cancel` keeps its existing job (`cancel.Register` fires `DoOurBestToCleanUp` → `Hitman.Kill`). `abandon` is the new signal whose only job is "stop waiting, do not touch the process". Only `abandon` is passed into `WaitForExitAsync`; do NOT link `cancel` in. When `cancel` fires and the kill works, the process exits and the wait returns naturally via the `Exited` event. When `cancel` fires and the kill DOESN'T work (Philips), the wait keeps going until `abandon` fires from the server's 2-minute escalation. Linking `cancel` into the wait token would race the kill against the wait-cancellation and lose the natural-exit code on the happy path. + +```csharp +using (cancel.Register(() => DoOurBestToCleanUp(process, error))) +{ + process.BeginOutputReadLine(); + process.BeginErrorReadLine(); + + try + { + await process.WaitForExitAsync(abandon).ConfigureAwait(false); + } + catch (OperationCanceledException) when (abandon.IsCancellationRequested && !process.HasExited) + { + info("Tentacle has abandoned this script. The underlying script process may still be running on this host."); + SafelyCancelRead(process.CancelErrorRead, debug); + SafelyCancelRead(process.CancelOutputRead, debug); + return ScriptExitCodes.AbandonedExitCode; + } + + // process exited (naturally or via cancel-triggered kill) — existing cleanup path + SafelyCancelRead(process.CancelErrorRead, debug); + SafelyCancelRead(process.CancelOutputRead, debug); + return SafelyGetExitCode(process); +} +``` + +**Diff shape — `ExecuteCommand` becomes `ExecuteCommandAsync`, all callers migrate.** Search across the repo found ~20 call sites. Every one updates. + +Production code: +- `source/Octopus.Tentacle.Core/Util/CommandLine/SilentProcessRunner.cs` — the method itself. Rename, return `Task`, swap `WaitForExit()` for `await WaitForExitAsync(abandon)`. Two-token signature. +- `source/Octopus.Tentacle/Util/ISilentProcessRunner.cs` — interface and the in-process wrapper become async. +- `source/Octopus.Tentacle/Util/CommandLineRunner.cs` — caller migration. +- `source/Octopus.Tentacle.Core/Services/Scripts/RunningScript.cs` — `RunScript` → `RunScriptAsync`; ctor takes `abandonToken` alongside `runningScriptToken`; `Execute()` awaits the new path. +- `source/Octopus.Tentacle.Core/Services/Scripts/ScriptServiceV2.cs` — `LaunchShell` passes `abandonToken` from the wrapper. `RunningScriptWrapper` gains `abandonTokenSource`. New `AbandonScriptAsync` method. +- `source/Octopus.Tentacle.Contracts/ScriptServiceV2/` — new `AbandonScriptCommandV2.cs`, interface method on `IScriptServiceV2.cs` (per Section 1). +- `source/Octopus.Tentacle.Contracts/ScriptExitCodes.cs` — add `AbandonedExitCode = -48`. +- Capabilities advertisement (`AbandonScriptV2`). + +Kubernetes integration test scaffolding (all caller-migration, no logic change): +- `source/Octopus.Tentacle.Kubernetes.Tests.Integration/Tooling/KubeCtlTool.cs` +- `source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/DockerImageLoader.cs` (2 call sites) +- `source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/KubernetesAgentInstaller.cs` (3 call sites) +- `source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/KubernetesClusterInstaller.cs` (4 call sites) +- `source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/Tooling/HelmDownloader.cs` +- `source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/Tooling/ToolDownloader.cs` + +Tentacle integration test scaffolding (caller migration): +- `source/Octopus.Tentacle.Tests.Integration/PowerShellStartupDetectionTests.cs` (3 call sites) +- `source/Octopus.Tentacle.Tests.Integration/Util/SilentProcessRunnerFixture.cs` +- `source/Octopus.Tentacle.Tests.Integration/Support/TentacleFetchers/LinuxTentacleFetcher.cs` + +**What happens to stdout/stderr after abandon.** Returning `AbandonedExitCode` unwinds the method. The outer `using (var process = new Process())` disposes the Process, which closes our end of the redirected pipes. The OS process may get EPIPE on its next stdout/stderr write. This is consistent with the ticket: we're closing our own handles, not killing the runaway process. The script's runtime keeps doing whatever it's doing; many scripts ignore broken-pipe errors, and scripts that fail on them already had nowhere to log anyway. The alternative — leaving the Process and its pipes pinned in memory indefinitely — is the resource-accumulation problem we already rejected. + +**Async correctness watch-outs for the implementation plan:** +- Every new async method gets `.ConfigureAwait(false)`. +- No `.Result` / `.Wait()` calls on the new path; if a caller can't easily be made async, surface it for separate handling rather than block-on-async. +- Verify no deadlock under the Tentacle's synchronisation context (none, but worth confirming). + +## Section 3 — State, exit code, log wording + +- **Exit code:** `ScriptExitCodes.AbandonedExitCode = -48`. Distinct from `CanceledExitCode (-43)`. Server-side telemetry can tell abandoned from cancelled even though task UI surfaces both as "Cancelled" per ADR-042. +- **State on GetStatus after abandon:** `(ProcessState.Complete, AbandonedExitCode, latestLogs)`. Same shape as Cancel returns today. +- **Honest log line:** `"Tentacle has abandoned this script. The underlying script process may still be running on this host."` Written once, into the workspace script log, near the end of the abandon path. +- **Workspace cleanup on subsequent `CompleteScript`:** targeted best-effort. `CompleteScript` reads the stateStore and checks the persisted exit code. If `AbandonedExitCode`, wrap `workspace.Delete` in try/catch, log a `Warn` to systemLog naming the leaked directory, return success. For any other exit code, `workspace.Delete` is called as today and exceptions propagate. This way the relaxed-deletion policy applies only to the rare abandon case; bugs that leak handles on normal-completion paths can't hide under a blanket try/catch. No janitor — the ticket already says OS-level state on the host is the customer's problem. +- **Idempotency — actual-status return (NOT silent no-op):** + - Abandon called twice on the same already-abandoned ticket → returns the cached `(Complete, AbandonedExitCode, logs)` response. + - Abandon called on a ticket that completed naturally before the abandon arrived (race case the server-side session flagged) → returns `(Complete, realExitCode, logs)` with the **real exit code**, distinct from `AbandonedExitCode`. The server uses this distinction to log *"Script had already completed before abandon was needed"* instead of *"Tentacle abandoned the script"*. Silent no-op would hide this signal. + - Abandon called on an unknown ticket (never started, or already cleaned up via `CompleteScript`) → returns `(Complete, UnknownScriptExitCode, [])`, matching Cancel's behaviour for the same case. +- **Race with natural completion:** the wrapper's existing `StartScriptMutex` (or a new dedicated lock) serialises abandon entry. If state is already Complete, abandon returns the cached status per the rules above. + +## Section 4 — Automated test strategy + +### 4.1 `SilentProcessRunner` unit tests + +Style: matches existing `SilentProcessRunnerFixture.cs`. Use short-lived helper scripts/exes as process subjects. + +| Test | Trigger | Verify | +|---|---|---| +| Normal exit | Run a process that exits 0 | Returns 0; no abandon log line captured by the `info` callback spy. | +| Cancel kills process | Long-running process; fire cancel token | Within 1s: process is killed (`process.HasExited == true`), return value is the kill-induced exit code (Linux: 137; Windows: process-defined). No abandon log line. | +| Abandon while running | Long-running process; fire abandon token | Within ~100ms: returns `AbandonedExitCode`, `info` callback received exactly one call containing "Tentacle has abandoned this script". Then assert `process.HasExited == false` and clean up by killing externally. | +| Abandon AFTER natural exit (race) | Process that exits in ~50ms; fire abandon token at the moment exit fires | Return value is the process's real exit code, not `AbandonedExitCode`. No abandon log line. Verifies the `if (abandon.IsCancellationRequested && !process.HasExited)` guard. | +| Both tokens fire | Long-running process; fire cancel; while cancel.Register is mocked to no-op, fire abandon | `info` callback gets abandon log line; return value is `AbandonedExitCode`. Verifies the unkillable-cancel + abandon escalation path that the integration tests then exercise end-to-end. | + +**Async-specific timing assertion:** `WaitForExitAsync(token)` returns within ~50ms of cancellation. **Test verification:** wrap the await in `Stopwatch.StartNew()`; assert elapsed < 100ms. Proves async wait is independent of process exit. + +**Thread-leak regression test:** start 50 stuck processes via `ExecuteCommandAsync` (all `await`ed in parallel), fire abandon on all; capture `Process.GetCurrentProcess().Threads.Count` before and 1s after; assert delta ≤ 5 (allow for threadpool jitter). The async path should produce zero parked threads at steady state. + +### 4.2 `ScriptServiceV2` service-layer tests + +Style: matches existing service-layer fixtures using in-memory script shells and stub workspace factories. + +| Test | Trigger | Verify | +|---|---|---| +| **Mutex release (load-bearing)** | Start `FullIsolation` script; abandon it; immediately start second `FullIsolation` script | Second `StartScript` returns with `State == Running` within 1s. Reading `ScriptIsolationMutex.TaskLock.Report()` between abandon and second-start shows the lock free in that window. | +| Abandon before StartScript | Call AbandonScript with a ticket never seen | Returns `(Complete, UnknownScriptExitCode)`. Matches existing Cancel behaviour for unknown ticket. | +| Abandon after CompleteScript | Start → Complete → Abandon | Returns `(Complete, UnknownScriptExitCode)` (wrapper already removed; stateStore gone). | +| Abandon then Cancel | Abandon, then Cancel same ticket | Cancel returns the cached abandoned response unchanged. Asserts via response equality. | +| **Cancel then Abandon (real flow)** | Long-running script; cancel; cancel.Register no-op'd to simulate unkillable; abandon | Final GetStatus returns `(Complete, AbandonedExitCode, logs)`. Log content includes the honest line. Subsequent same-ticket StartScript returns the cached state. | +| Abandon during StartScript launch | Concurrent: StartScript holding `StartScriptMutex`, AbandonScript called | Abandon serialises behind StartScript via the existing wrapper mutex. Final state is consistent (no half-abandoned wrapper). | +| Capability advertisement | Tentacle build with the abandon feature; query `CapabilitiesServiceV2.GetCapabilities()` | Response includes `AbandonScriptV2`. Tentacle builds without the feature do not advertise it. | + +### 4.3 Integration tests (real shells, real processes) + +Style: matches `Octopus.Tentacle.Tests.Integration/ClientScriptExecutionIsolationMutex.cs` (the closest existing analogue — real Tentacle, real script, mutex semantics under test). + +**Timing flakiness: use the existing builders, not raw shell + `Thread.Sleep`.** The integration test suite has stable patterns for this exact class of test: + +- `ScriptBuilder` (`Octopus.Tentacle.CommonTestUtils/Builders/ScriptBuilder.cs`) composes cross-platform script bodies. Use `.CreateFile(path)` to signal "script reached this line" and `.WaitForFileToExist(path)` to block the script on an event, not a sleep race. This is how `ClientScriptExecutionIsolationMutex` reliably exercises long-running scripts without `Thread.Sleep` timing assumptions. +- `TestExecuteShellScriptCommandBuilder` (`Octopus.Tentacle.Tests.Integration/Util/Builders/`) composes the script command: `.SetScriptBody(ScriptBuilder)`, `.WithIsolationLevel(...)`, `.WithIsolationMutexName(...)`, `.Build()`. +- `TentacleConfigurationTestCase.CreateBuilder()` and `ClientAndTentacleBuilder` set up real Tentacle + Halibut for the test. Same as existing tests. +- `TentacleServiceDecoratorBuilder.RecordMethodUsages(...)` decorates the script service so the test can assert how many times each method was called. Use this to verify capability negotiation and call counts for the new `AbandonScript` verb. +- `Wait.For(condition, timeout, onFail, ct)` is the event-driven polling helper. Always preferred over `Task.Delay` in test bodies. + +**Pattern to follow:** mirror `ClientScriptExecutionIsolationMutex.cs`. Stuck-script tests should use `ScriptBuilder.WaitForFileToExist(...)` as the "kernel-blocked" simulant rather than `sleep 600`. The file-wait is event-driven and the test can release it on demand by creating the file. For the unkillable variant, combine the file-wait pattern with the `Tentacle.Debug.DisableProcessKill` flag described in the manual test setup so `Hitman` becomes a no-op for the test's duration. + +| Test | Trigger | Verify | +|---|---|---| +| PowerShell + abandon (kill works) | Real PowerShell, `Start-Sleep -Seconds 600`, fire Cancel, normal kill path | Final response is `(Complete, CanceledExitCode)` via the existing path. **Negative check:** abandon log line is NOT present. Confirms we haven't regressed Cancel by accidentally hitting the abandon path. | +| PowerShell + abandon (kill mocked off) | Real PowerShell, sleep; `Hitman` mocked to no-op; fire Cancel; wait; fire AbandonScript | Within 2s of abandon: response is `(Complete, AbandonedExitCode, [...honest log line...])`; mutex is free (verified by starting a second `FullIsolation` script that Acquires within 1s); the real PowerShell process is still alive on the test host (verified via `Process.GetProcessById` outside the test). Test cleanup: kill the leftover PowerShell. | +| **Multi-level-deep hang (ticket-mandated)** | bootstrap → Calamari-shim → user script, with `Hitman` no-op flag set | All verifications from the previous row pass end-to-end through the multi-level launch chain. Confirms abandon works when the stuck process is not the immediate child of Tentacle. | +| Windows workspace cleanup with open handles | Run the abandon path; leave the simulated zombie holding the workspace log file open; call CompleteScript | CompleteScript returns without exception. Tentacle systemLog contains a `Warn` naming the leaked workspace directory. Workspace dir on disk still exists (assert via `Directory.Exists`). No exception bubbles up to the calling test (which simulates Server). | +| Polling Tentacle variant | Configure test fixture as Polling | All verifications from the kill-mocked-off row pass against a Polling Tentacle. | + +**End-to-end async thread audit.** Capture `Process.GetCurrentProcess().Threads.Count` 5s into a stuck-script scenario; assert no thread parked attributable to the script pipeline (use named threads or stack-walk via ETW if precise attribution needed). Most reliable proxy: total thread count not higher than baseline + epsilon. + +**Normal-path timing regression check.** Run a 100-iteration benchmark of normal short-script execution (`Write-Host "x"`); compare median wall-clock time vs. a baseline build without the changes. **Verify:** median delta within margin of error. The async swap should not measurably slow normal script execution. + +## Section 5 — Manual testing plan + +Manual scenarios on a real test Tentacle. All scenarios assume the parallel server-side build is deployed. + +### Setup + +- Test Octopus Server with EFT-3295 server-side build. +- Windows Tentacle (primary) + Linux Tentacle (smoke). +- Debug Tentacle build with `Tentacle.Debug.DisableProcessKill=true` making `Hitman.TryKillProcessAndChildrenRecursively` a no-op — simulant for "kill doesn't work" without engineering real kernel-level waits. +- Server-side feature flag `AbandonTentacleScriptOnCancellationTimeoutFeatureToggle` (default ON, configured on the test Octopus Server). + +### Where to find things (reference for verification steps below) + +- **Tentacle systemLog (Windows):** `C:\Octopus\Logs\OctopusTentacle.txt` (or whatever the test instance is configured with — confirm via `Tentacle show-configuration`). +- **Tentacle systemLog (Linux):** `/etc/octopus//Logs/OctopusTentacle.txt`. +- **Tentacle workspace root:** `/Work/`. Each script gets a subdirectory named after its `ScriptTicket`. Inside: `bootstrapRunner.log`, `Output.log`, `script.ps1`/`Bootstrap.sh`, the state store file. +- **Script log in UI:** Octopus Server → the task → expand the deployment step. The script log is what the customer sees and is what gets the honest abandon line. +- **Thread count (Windows):** PowerShell `(Get-Process Tentacle).Threads.Count`, or use Process Explorer's Threads tab. Capture before each scenario for a baseline. +- **Thread count (Linux):** `ps -o nlwp= -p $(pgrep -f Tentacle)` returns the LWP (thread) count for the Tentacle process. +- **Capability advertisement:** Tentacle systemLog at startup contains `Negotiated capabilities: [...]` lines and per-connection capability exchanges. Or: temporarily enable Halibut verbose tracing on the server side and inspect the `CapabilitiesResponseV2` payload from this Tentacle. +- **Mutex state in Tentacle log:** grep for `acquiring isolation mutex` / `Lock acquired` / `Releasing lock` lines with the relevant task ID. + +### M1 — Regression smoke (flag ON, normal script) + +Deploy `Write-Host "hello"; Start-Sleep 5; Write-Host "done"`. + +**Verify (all must pass):** +1. Octopus UI task status → **Success** (green tick). +2. Script log in UI shows `hello` and `done`; no abandon line. +3. Tentacle systemLog: `grep "abandon" OctopusTentacle.txt` → zero matches for this task ID. +4. Tentacle systemLog shows the normal acquire/release pair: `grep "" OctopusTentacle.txt | grep -E "Lock acquired|Releasing lock"` → both lines present in order. +5. Thread count (sampled 5s after task completes) → within ±2 of pre-test baseline. + +### M2 — Cancel still works (flag ON, killable script) + +`DisableProcessKill=false`. Deploy `Start-Sleep -Seconds 300`. Wait ~10s. Click **Cancel** in Octopus UI. + +**Verify:** +1. UI task status transitions to **Cancelled** within 30s. +2. Tentacle systemLog: `grep "Hitman\|Releasing lock" OctopusTentacle.txt | tail -20` shows the kill attempt followed by mutex release for this task ID. +3. PowerShell process is gone: `Get-Process powershell -ErrorAction SilentlyContinue` returns nothing for the powershell instance that was running the script. (Match by PID captured from Tentacle log at script start.) +4. `grep "abandon" OctopusTentacle.txt` → zero matches for this task ID. Cancel path was used, not abandon. +5. Deploy a second project to the same Tentacle → starts immediately (mutex was released by the normal Cancel path). + +### M3 — The Philips scenario (flag ON, unkillable script) + +`Tentacle.Debug.DisableProcessKill=true`. Restart Tentacle. Capture thread-count baseline. Deploy `Start-Sleep -Seconds 600`. Note the script's PowerShell PID from the Tentacle log (`grep "Starting powershell" OctopusTentacle.txt | tail -1`). Click **Cancel** after ~10s. Wait for server-side abandon timeout (1–5 min per parallel session config). + +**Verify (all must pass; this is the load-bearing scenario):** + +1. **Server side called Abandon.** Server log (`OctopusServer.txt`) shows an `AbandonScript` call for this task's ticket, timestamped after the Cancel attempt + the server's abandon timeout. If the parallel session hasn't named the call yet, grep for "abandon" in server log. +2. **Honest log line in the customer-visible task log.** Open the task in Octopus UI → expand the deployment step → confirm the line `Tentacle has abandoned this script. The underlying script process may still be running on this host.` is present in the script log section. +3. **Tentacle systemLog records the abandon path.** `grep -A2 "abandon" OctopusTentacle.txt | tail -30` shows: AbandonScript invocation received, abandon token cancelled, mutex released for this task ID, wrapper removed. +4. **Mutex released — load-bearing check.** Immediately deploy a second project (any trivial script, `Write-Host "ok"`) to the same Tentacle. **Pass:** second deployment starts within 5s. **Fail:** queues indefinitely with "Waiting for the script in task..." message. +5. **Task UI status = Cancelled** (not a new "Abandoned" state — per ADR-042). +6. **Thread count returned to baseline.** Sample 10s after the abandon. **Pass:** within ±2 of baseline. **Fail:** count grew by 1 or more and stays grown. +7. **The PowerShell process is still alive on the host.** `Get-Process -Id ` returns the process. This is the ticket's "we do not kill the runaway" — verify we didn't accidentally start killing it. Kill it manually at end of test for cleanup. +8. **Exit code in the task log = -48 (AbandonedExitCode)** (or whatever surfaces in the Server-side detail view). Distinguishes from `-43` (CanceledExitCode). + +### M4 — Repeated abandon (thread-leak check under repetition) + +Capture baseline thread count and Tentacle process working-set memory. Run M3 ten times back-to-back (script the loop so each iteration: deploy → cancel → wait for abandon → next). + +**Verify:** +1. Sample thread count after each iteration. **Pass:** count stays within ±5 of baseline across all ten runs. **Fail:** monotonic growth — indicates the chosen option's thread-release mechanism is broken. +2. Sample Tentacle working-set memory after each iteration. **Pass:** stays within ~50MB of baseline (some growth from log buffers etc. is expected). **Fail:** grows by more than ~10MB per iteration — indicates Process objects or zombie tasks are being retained. +3. After all ten runs, deploy a normal project. **Pass:** runs normally, no perf degradation. +4. Kill all leftover `powershell.exe` / `sleep` processes manually at end of test. + +Async should produce zero thread cost per abandon; any growth across runs means the implementation diverged from the design. + +### M5 — Server-side flag off (Tentacle behaves as today) + +Set the server-side `AbandonTentacleScriptOnCancellationTimeoutFeatureToggle` to OFF in the test Octopus Server. Restart Server. Leave Tentacle untouched. + +**Verify:** +1. **Server doesn't dispatch Abandon.** Repeat the M3 setup. Wait past the would-be 2-minute escalation point. Server log: `grep "AbandonScript" OctopusServer.txt` → zero matches for this task ID. +2. **Tentacle still advertises the capability.** Optional sanity check via Halibut verbose tracing: `CapabilitiesResponseV2` from this Tentacle still contains `AbandonScriptV2`. The flag lives on the Server, not on Tentacle. +3. **Tentacle stays wedged.** Subsequent deployment to this Tentacle queues with "Waiting for the script in task...". Confirms today's behaviour is preserved when Server has the feature off. +4. Recovery: restart Tentacle (the existing workaround). Verify subsequent deployments work again. + +### M6 — Workspace cleanup with open handles (Windows-specific) + +Run M3 to completion. Note the script's `ScriptTicket` from the Tentacle log. + +**Verify:** +1. **Workspace dir still exists.** `dir \Work\` returns a directory listing with log files present. The zombie process (or our retained Process object, depending on option chosen) holds open file handles preventing deletion. +2. **systemLog records the failure.** `grep -i "workspace\|delete" OctopusTentacle.txt | grep ` shows a `Warn`-level entry naming the directory that could not be deleted, with the underlying I/O exception message. +3. **No propagated exception to Server.** `CompleteScript` returns normally; Server log shows successful completion of the task. **Pass:** no error response from Tentacle, no retry storm in server log. +4. **Tentacle continues to function.** Deploy a third project (not to the wedged workspace). **Pass:** runs normally. +5. **Manual cleanup of leaked workspace works after the zombie process is killed.** Kill the PowerShell process manually; `rmdir /s /q ` should now succeed. Confirms the leak is bounded (would be reclaimed if we ever added a janitor). + +### M7 — Polling Tentacle variant + +Register a Polling Tentacle against the test server. Repeat M3 setup and execution. + +**Verify:** +1. All M3 verification points pass with no Polling-specific differences. The Halibut RPC path is the same — only connection initiation direction differs. +2. **Polling-specific check:** during the abandon, Tentacle's polling loop continues. `grep "Polling" OctopusTentacle.txt | tail -20` shows polling activity through the abandon and after. **Pass:** polling not blocked by the abandon flow. +3. After abandon, the Polling Tentacle picks up the next deployment from the server. **Pass:** new deployment dispatched and runs (mutex released). + +### M8 — Linux smoke + +On a Linux Tentacle, deploy a Bash script: `sleep 600`. Repeat M2 (kill works) and M3 (kill mocked off). + +**Verify:** +1. **M2 on Linux:** `ps -p ` shows the bash/sleep process gone after Cancel. Tentacle systemLog shows `Hitman` kill path used. Same outcomes as Windows M2. +2. **M3 on Linux:** all M3 verification points pass. Thread count via `ps -o nlwp= -p $(pgrep -f Tentacle)`. Workspace location: `/etc/octopus//Work//`. +3. **Linux file-handle behaviour differs:** unlike Windows, Linux generally allows deletion of files held open by other processes (the inode survives until the last handle closes). For M6's workspace-cleanup analogue on Linux, the workspace deletion is more likely to succeed even with the zombie process running. Note in test result. +4. Confirms the implementation isn't accidentally Windows-only and behaves sensibly on Linux's different file-handle semantics. + +### M9 — Server escalation ordering + +Server escalation is hardcoded at **2 minutes** post-Cancel for the first release (`AbandonTentacleScriptOnCancellationTimeoutFeatureToggle`'s timeout constant). Not configurable in production; ask the server-side session for a debug-build override constant if you want to run this faster in your test environment. + +**Verify the killable case (no escalation expected):** +1. Run M2 (killable script + cancel). Wait at least 3 minutes. +2. Server log: `grep "AbandonScript" OctopusServer.txt | grep ` → **zero matches.** Cancel succeeded inside the 2-minute window; server correctly did not escalate. +3. Tentacle log: zero abandon entries for this task ID. + +**Verify the unkillable case (escalation expected):** +4. Run M3 (kill mocked off + cancel). Wait through the 2-minute timeout (use a stopwatch). +5. Server log: `grep "AbandonScript" OctopusServer.txt | grep ` → **exactly one match,** timestamped approximately 2 minutes after the Cancel. +6. Tentacle log: one abandon entry for this task ID. + +**Verify the actual-status race case** (server-side session's idempotency concern): +7. Set up M3, but let the script complete naturally just before the 2-minute timer fires (use a script that runs ~110 seconds). +8. Server fires AbandonScript anyway because the completion event hasn't reached it yet. +9. Tentacle returns `(Complete, realExitCode, logs)` — NOT `AbandonedExitCode`. +10. Server task log entry: *Script had already completed before abandon was needed.* Confirms the "abandon was unnecessary" signal works end-to-end. + +**Bug indicators to flag back to the server session:** +- Server calls AbandonScript on every Cancel (even killable cases) → server's escalation predicate is wrong. +- Server retries AbandonScript multiple times for the same ticket → idempotency on the server side broken. +- Server calls AbandonScript before the 2-minute window → timer is wrong. +- Server calls AbandonScript even with the Tentacle capability missing → capability gating broken; should not have scheduled. + +### Sign-off criteria + +To turn the feature flag on by default in a future release: M1–M5 pass on Windows; M3 + M4 pass on Linux; M7 passes on Polling; M9 confirms server escalation policy; M6 confirms workspace leak is bounded and logged. + +## Risks and rollout + +- **Feature flag off by default** for the first release. Customer-by-customer opt-in. +- **Sequence:** after EFT V1 cleanup closes (target end May 2026), before Task Cap 320, targeting Philips' July self-host release. +- **Telemetry:** count of AbandonScript calls per Tentacle per day. Spike = signal that either Cancel is broken or this feature is masking a different bug. +- **Soak test pre-release:** 1000 normal scripts with the server-side flag ON, verify no resource leak vs. flag OFF baseline. + +## Open questions for external reviewer + +(None remaining. Workspace cleanup policy resolved 2026-05-21 — targeted best-effort gated on `AbandonedExitCode` in the stateStore. No janitor; OS-level state on the host is the customer's responsibility per the ticket.) + +## Coordination — locked with the server-side session (2026-05-21) + +Aligned via Linear thread on EFT-3295 (commenter Jim, both sessions). Items below are locked unless explicitly noted. + +**Contract (final shape):** + +- `ScriptStatusResponseV2 AbandonScript(AbandonScriptCommandV2 command)` on `IScriptServiceV2`. +- `AbandonScriptCommandV2 { ScriptTicket Ticket; long LastLogSequence; }` — same shape as `CancelScriptCommandV2`. Server-side dropped its initial `ServerTaskId` and "cancellation correlation id" proposal; `ScriptTicket` is sufficient. +- Capability name: `AbandonScriptV2`. + +**Idempotency (final):** Tentacle returns actual current status. Already-completed script returns `(Complete, realExitCode, logs)` — distinct from `AbandonedExitCode`, so the server's task log entry can record that the abandon was unnecessary. Unknown/already-cleaned-up ticket returns `(Complete, UnknownScriptExitCode, [])`, matching Cancel's existing shape. + +**Capability check is the primary gate.** Server uses `BackwardsCompatibleAsyncCapabilitiesV2Decorator` to query `AbandonScriptV2` once per session. Capability absent → server does not schedule the abandon dispatch at all. The RPC-fail-then-log path stays as a defensive fallback for capability-cache staleness, not the primary path. + +**One off-switch, server-side:** `AbandonTentacleScriptOnCancellationTimeoutFeatureToggle` (default ON). Governs whether server escalates to AbandonScript at all. No Tentacle-side flag — Tentacle's capability advertisement is binary on build version. (Earlier draft had a Tentacle-side flag too; dropped after PR review surfaced that it can't be cleanly toggled at runtime without versioning the service contract.) + +**Escalation timing (locked for first release):** 2 minutes. Both V1 and V2 execution pipelines escalate to AbandonScript on their next status-poll once cancellation has been pending that long. Hardcoded on the server toggle class, not configurable. Server-side updated 2026-05-21: trigger switched from a delayed NSB message to a polling-loop check; no new timers on the server side. The Tentacle-side contract is unchanged either way. + +**Execution-pipeline scope (server-side, 2026-05-21):** V1 *and* V2 server-side execution pipelines call AbandonScript via the same contract. Philips is V1 self-host so V1 is actually the urgent path. Doesn't change anything Tentacle is building. + +**Post-abandon flow:** + +1. Server calls `AbandonScript` → gets `ScriptStatusResponseV2`. +2. Server publishes `TentacleScriptAbandonedEvent`. +3. Existing post-cancel path proceeds (eventually calls `CompleteScript` downstream). + +Server-side will verify the exact GetStatus-poll-vs-read-from-response detail during their implementation plan. + +**Task log wording:** + +- Tentacle script log (this doc's Section 3): *Tentacle has abandoned this script. The underlying script process may still be running on this host.* +- Server task log (server session's surface). Server session's working proposal: + - On dispatch: *Cancellation hasn't taken effect on Tentacle after 2 minutes. Abandoning the script to release the script-isolation mutex.* + - On Tentacle returning `AbandonedExitCode`: *Tentacle abandoned the script.* + - On Tentacle returning a real exit code (abandon unnecessary): *Script had already completed before abandon was needed.* +- I pushed back on the dispatch wording — "script-isolation mutex" exposes internal terminology to the customer. Suggested rewrite: *Cancellation hasn't taken effect on Tentacle after 2 minutes. Abandoning the script so this target can accept new deployments.* Server session's call which to ship with. diff --git a/docs/superpowers/specs/2026-05-25-split-async-migration-from-abandon-feature-design.md b/docs/superpowers/specs/2026-05-25-split-async-migration-from-abandon-feature-design.md new file mode 100644 index 000000000..6db82ed45 --- /dev/null +++ b/docs/superpowers/specs/2026-05-25-split-async-migration-from-abandon-feature-design.md @@ -0,0 +1,131 @@ +# Split async migration into its own PR beneath the abandon feature + +## Context + +Current state: + +- **PR #1226 (`jimpelletier/eft-3295-tentacle-script-abandonment-to-release-the-mutex`)** at commit `583eb46c` contains both: + - The async migration of `SilentProcessRunner.ExecuteCommand` → `ExecuteCommandAsync` (and all its callers). + - The script-abandonment feature (abandon token, `AbandonScriptCommandV2`, `AbandonedExitCode`, RPC method, capability, tests). +- **PR #1235 (`jimpelletier/eft-3295-async-signature-propagation`)** stacks on top of #1226 and pushes the async signature higher into CLI host, Kubernetes paths, `IServiceConfigurator`, `ICommandLineRunner`, etc. + +The stack is currently: + +``` +main ← #1226 (abandon + async migration) ← #1235 (push higher) +``` + +## Goal + +Invert the lower half of the stack so the async migration sits beneath the abandon feature: + +``` +main ← [NEW BASE PR] async migration ← #1226 (rebased: abandon feature only) ← #1235 (push higher, unchanged in shape) +``` + +The new base PR is a clean refactoring change that is reviewable and mergeable independently of the abandon feature. #1226 becomes a focused feature PR that adds script abandonment on top of the foundation. + +## Non-goals + +- This spec does NOT cover restructuring #1235. That PR's content remains as it is and continues to stack on top of #1226. +- This spec does NOT widen the scope of the abandon feature. The abandon feature's existing content is preserved, just rebased. +- This spec does NOT attempt to surgically split the existing #1226 commits via cherry-pick or interactive rebase. The commits intermix concerns and would conflict heavily. + +## Approach + +End-state rebuild rather than commit surgery. + +Take the file states at `583eb46c` and split them into two clean sets of changes built from `main`. Each PR is constructed as a small number of logical commits that produce the same final state when stacked. + +### Base PR — "Migrate SilentProcessRunner to async" + +Branch name: `jimpelletier/eft-3295-async-migration-base`. + +Scope: the minimum change required to make `SilentProcessRunner.ExecuteCommand` async, with documented sync↔async boundaries at every immediate caller. + +Contents: + +1. **`SilentProcessRunner`** — `ExecuteCommand` → `ExecuteCommandAsync`. Internal change: `process.WaitForExit()` → `await process.WaitForExitAsync(cancel)`. The `cancel` token is passed directly to `WaitForExitAsync` so the existing cancel semantics are preserved (when cancel fires, the await throws `OperationCanceledException`; the existing `cancel.Register(() => DoOurBestToCleanUp(...))` still fires Kill+Close on a separate thread). **No other SilentProcessRunner changes.** `DoOurBestToCleanUp` remains unchanged including the `process.Close()` call. `SafelyWaitForAllOutput` remains unchanged. +2. **NET Framework polyfill** for `WaitForExitAsync` (not available on net48): a `WaitForExitAsyncNetFramework` helper using `Process.Exited` event + `TaskCompletionSource`. +3. **`ISilentProcessRunner`, `CommandLineRunner`, `CommandLineInvocation`** — interface and helper class signatures migrated to async. +4. **Immediate sync callers** — six sites updated with `.GetAwaiter().GetResult()`: + - `PowerShellPrerequisite.Check()` (WPF installer prerequisite) + - `KubernetesDirectoryInformationProvider.GetDriveBytesUsingDu()` (called from `IMemoryCache.GetOrCreate` factory) + - `SystemCtlHelper.RunServiceCommand()` (2 call sites) + - `LinuxServiceConfigurator.WriteUnitFile`, `IsSystemdInstalled`, `HaveSudoPrivileges` (3 call sites) + - `WindowsServiceConfigurator.Sc()` + - `CommandLineRunner.Execute(CommandLineInvocation, ...)` +5. **Sync-boundary comments** — every one of the six sites gets the same comment pattern: "We're in X. Y must be sync because Z. We block with `.GetAwaiter().GetResult()`. This is safe because we're on a plain thread-pool worker — when the async work finishes it can resume on any free thread, so the block resolves normally." +6. **`CapabilitiesServiceV2` nameof change** — replace the `"AbandonScriptV2"` string literal with `nameof(...)`. Small refactor that fits the cleanup theme. + +What this PR does NOT include: + +- No `abandon` parameter on `ExecuteCommandAsync`. +- No removal of `process.Close()` from `DoOurBestToCleanUp`. +- No long-form documentation comments on `DoOurBestToCleanUp`, `SafelyWaitForAllOutput`, or the `WaitForExitAsync` call site (those describe race-related semantics that only matter once the abandon flow is added). +- No grandchild test comment improvements (those describe the async-cancel race the abandon PR fixes). +- No abandon-specific contracts, RPC methods, capabilities, env vars, or tests. + +### Stacked PR — "Add Script abandonment feature" (rebased #1226) + +Branch: `jimpelletier/eft-3295-tentacle-script-abandonment-to-release-the-mutex` (force-pushed). + +Scope: the script abandonment feature, building on the async foundation. + +Contents: + +1. **Add `abandon` parameter to `ExecuteCommandAsync`** — second `CancellationToken` parameter. Switch internal await from `WaitForExitAsync(cancel)` to `WaitForExitAsync(abandon)`. Cancel continues to flow through `cancel.Register`. +2. **Remove `process.Close()` from `DoOurBestToCleanUp`** — cancel-path race fix. Now needed because the abandon flow relies on `Exited`-event delivery via `WaitForExitAsync(abandon)`'s TCS, and `Close()` tears down the wait state. +3. **Long-form documentation comments** on `DoOurBestToCleanUp`, `SafelyWaitForAllOutput`, and the `WaitForExitAsync` call site explaining the race, the grandchild-pipe scenario, and the worst-case cancel latency. +4. **Contracts** — `AbandonedExitCode = -48`, `AbandonScriptCommandV2`, `IScriptServiceV2.AbandonScript`, `IAsyncClientScriptServiceV2.AbandonScriptAsync`. +5. **`ScriptServiceV2.AbandonScriptAsync` implementation** — abandon-token wrapper, fires abandon CTS, returns response. +6. **Abandon-token plumbing through `RunningScript`** — constructor accepts abandon token, passes through to `ExecuteCommandAsync`. +7. **`TentacleDebugDisableProcessKill` env var** — test affordance for the stuck-script scenario. +8. **`AbandonScriptV2` capability** — advertised in `CapabilitiesServiceV2`. +9. **Best-effort `workspace.Delete`** gated on `AbandonedExitCode` in `CompleteScriptAsync`. +10. **Abandon-specific tests** — service-layer (`ScriptServiceV2Fixture`) and integration (`ClientScriptExecutionAbandon`, `SilentProcessRunnerFixture.AbandonToken_*`). +11. **Improved grandchild test comments** — rewritten to describe the async behavior being guarded. + +### PR #1235 unchanged + +`jimpelletier/eft-3295-async-signature-propagation` continues to stack on top of #1226 with its 7 push-higher commits. Once #1226 is force-pushed, #1235 may need a rebase to stay clean but its content is the same. + +## Mechanics + +Done in this order to keep each step reversible: + +1. **Capture safety reference**: tag the current state of both branches before mutating anything (`git tag claude-safety-2026-05-25-pre-split #1226-tip #1235-tip`). The existing `claude-safety-before-rollback` tag stays. +2. **Build the base branch from `main`**: + - Branch `jimpelletier/eft-3295-async-migration-base` from `main`. + - Apply file-level changes for the base PR scope, committing as a small number of logical commits (e.g., "Migrate SilentProcessRunner to async", "Migrate ISilentProcessRunner and CommandLineRunner", "Document sync↔async boundaries", "Use nameof for capability"). + - Push the branch and open the new PR with `main` as base. +3. **Rebuild #1226 on top of the base branch**: + - Hard-reset `jimpelletier/eft-3295-tentacle-script-abandonment-to-release-the-mutex` to the new base branch's tip. + - Apply file-level changes for the abandon feature scope (the delta from base PR's end state to `583eb46c`'s state). + - Commit as a small number of logical commits. + - Force-push #1226. GitHub will automatically update the PR diff to show only the abandon-feature delta. +4. **Rebase #1235**: change #1235's branch to be a rebase of its 7 commits on top of the updated #1226. Force-push. + +## Risks + +- **Force-push to #1226** will disrupt any in-flight reviews. PR comments referencing specific commit SHAs will become stale. Mitigated by: tagging the pre-split state for reference; explicitly noting in PR #1226 that history has been rewritten and pointing to the new base PR. +- **CI must pass on the base PR alone**. The base PR's `WaitForExitAsync(cancel)` wiring is a behavioural-equivalent of the sync version, so existing cancel tests (e.g. `ShouldCancelPing`) should pass. To be verified by running the build before opening the PR. +- **Compatibility with #1235**. The 7 push-higher commits build on file states that exist in #1226 today. After the rebase, those file states may shift slightly (e.g., the abandon PR no longer carries the same intermediate commit boundaries). Conflicts during the #1235 rebase are likely but should be mechanical to resolve. + +## Verification + +After the split: + +- `git diff main..base-branch` produces a small focused diff matching the base PR scope above. +- `git diff base-branch..#1226` produces a diff matching the abandon-feature scope above. +- `git diff #1226..#1235` produces the existing 7-commit push-higher diff. +- Build the base branch standalone — must compile and CI must pass. +- Build #1226 stacked on base — must produce the same end-state as `583eb46c` does today. +- Build #1235 stacked on #1226 — must produce the same end-state as it does today. + +## Success criteria + +- New base PR exists at `jimpelletier/eft-3295-async-migration-base` with a clean, focused diff against `main`. +- PR #1226 is rebased to target the new base branch and its diff shows only the abandon feature. +- PR #1235 still works as a stacked PR on top of #1226. +- All three PRs build and pass CI. diff --git a/source/Octopus.Manager.Tentacle.Tests/Builders/SetupTentacleWizardModelBuilder.cs b/source/Octopus.Manager.Tentacle.Tests/Builders/SetupTentacleWizardModelBuilder.cs index 0e71b9e1a..b86c59576 100644 --- a/source/Octopus.Manager.Tentacle.Tests/Builders/SetupTentacleWizardModelBuilder.cs +++ b/source/Octopus.Manager.Tentacle.Tests/Builders/SetupTentacleWizardModelBuilder.cs @@ -32,6 +32,8 @@ public SetupTentacleWizardModelBuilder() commandLineRunner = Substitute.For(); commandLineRunner.Execute(Arg.Any(), Arg.Any()).Returns(true); commandLineRunner.Execute(Arg.Any>(), Arg.Any()).Returns(true); + commandLineRunner.ExecuteAsync(Arg.Any(), Arg.Any()).Returns(true); + commandLineRunner.ExecuteAsync(Arg.Any>(), Arg.Any()).Returns(true); } public SetupTentacleWizardModelBuilder WithTelemetryService(ITelemetryService telemetryService) diff --git a/source/Octopus.Manager.Tentacle/PreReq/PowerShellPrerequisite.cs b/source/Octopus.Manager.Tentacle/PreReq/PowerShellPrerequisite.cs index 92c3768f1..4f13e91b4 100644 --- a/source/Octopus.Manager.Tentacle/PreReq/PowerShellPrerequisite.cs +++ b/source/Octopus.Manager.Tentacle/PreReq/PowerShellPrerequisite.cs @@ -1,5 +1,7 @@ using System; using System.IO; +using System.Threading; +using System.Threading.Tasks; using Octopus.Tentacle.Diagnostics; using Octopus.Tentacle.Util; @@ -9,9 +11,24 @@ public class PowerShellPrerequisite : IPrerequisite { public string StatusMessage => "Checking that Windows PowerShell 2.0 is installed..."; + // Why this is sync: IPrerequisite.Check() is part of a sync interface used by + // the WPF installer's prerequisite plumbing. Making it async would mean + // converting the whole IPrerequisite chain, which is a wider refactor than + // this PR. + // + // Why blocking on the async call is safe: PreReqWindow.Start dispatches each + // prerequisite via DispatchHelper.Background, which queues us via + // ThreadPool.QueueUserWorkItem. That's a plain thread-pool worker with no + // SynchronizationContext, so there's nothing for the awaited continuation + // to wait on. + // See https://blog.stephencleary.com/2012/07/dont-block-on-async-code.html public PrerequisiteCheckResult Check() + => CheckAsync().GetAwaiter().GetResult(); + + async Task CheckAsync() { - return CheckPowerShellIsInstalled(out var commandLineOutput) + var (installed, commandLineOutput) = await CheckPowerShellIsInstalledAsync(); + return installed ? PrerequisiteCheckResult.Successful() : PrerequisiteCheckResult.Failed("Windows PowerShell 2.0 or above does not appear to be installed and on the System Path on this machine. Please install Windows PowerShell or add it to the System Path then re-run this setup tool.", helpLink: "http://g.octopushq.com/HowDoIInstallPowerShell", @@ -19,21 +36,21 @@ public PrerequisiteCheckResult Check() commandLineOutput: commandLineOutput); } - static bool CheckPowerShellIsInstalled(out string commandLineOutput) + static async Task<(bool installed, string commandLineOutput)> CheckPowerShellIsInstalledAsync() { var stdOut = new StringWriter(); var stdErr = new StringWriter(); const string powerShellExe = "powershell.exe"; const string arguments = "-NonInteractive -NoProfile -Command \"Write-Output $PSVersionTable.PSVersion\""; - commandLineOutput = $"{powerShellExe} {arguments}"; + var commandLineOutput = $"{powerShellExe} {arguments}"; // Despite our old check conforming to Microsoft's recommendations // for PS version checking around the 1.0/2.0 era, and extending // to detect 3.0, it failed to detect 4. Going the direct route: try { - SilentProcessRunnerExtended.ExecuteCommand( + await SilentProcessRunnerExtended.ExecuteCommandAsync( powerShellExe, arguments, ".", @@ -45,12 +62,12 @@ static bool CheckPowerShellIsInstalled(out string commandLineOutput) commandLineOutput = $"{commandLineOutput}{Environment.NewLine}{stdOut}{Environment.NewLine}{stdErr}"; - return outputText.Contains("Major"); + return (outputText.Contains("Major"), commandLineOutput); } catch (Exception ex) { commandLineOutput = $"{commandLineOutput}{Environment.NewLine}{ex}"; - return false; + return (false, commandLineOutput); } } } diff --git a/source/Octopus.Manager.Tentacle/TentacleConfiguration/SetupWizard/ReviewAndRunScriptTabViewModel.cs b/source/Octopus.Manager.Tentacle/TentacleConfiguration/SetupWizard/ReviewAndRunScriptTabViewModel.cs index 2b8eef4c9..1eb1be871 100644 --- a/source/Octopus.Manager.Tentacle/TentacleConfiguration/SetupWizard/ReviewAndRunScriptTabViewModel.cs +++ b/source/Octopus.Manager.Tentacle/TentacleConfiguration/SetupWizard/ReviewAndRunScriptTabViewModel.cs @@ -46,7 +46,7 @@ public async Task GenerateAndExecuteScript() { var script = GenerateScript(); ContributeSensitiveValues(logger); - success = commandLineRunner.Execute(script, logger); + success = await commandLineRunner.ExecuteAsync(script, logger); } catch (Exception ex) { diff --git a/source/Octopus.Tentacle.Client/ITentacleClient.cs b/source/Octopus.Tentacle.Client/ITentacleClient.cs index c79a285fc..5c6915642 100644 --- a/source/Octopus.Tentacle.Client/ITentacleClient.cs +++ b/source/Octopus.Tentacle.Client/ITentacleClient.cs @@ -8,6 +8,7 @@ using Octopus.Tentacle.Client.Scripts.Models; using Octopus.Tentacle.Contracts; using Octopus.Tentacle.Contracts.Logging; +using Octopus.Tentacle.Contracts.ScriptServiceV2; namespace Octopus.Tentacle.Client { @@ -59,6 +60,17 @@ Task StartScript(ExecuteScriptCommand command, /// The result, which includes the CommandContext for the next command Task CancelScript(CommandContext commandContext, ITentacleClientTaskLog logger); + /// + /// Abandon a running script. Signals Tentacle to release the script's isolation mutex + /// and clean up its workspace without waiting for the underlying process to exit. + /// Used as an escape hatch when CancelScript cannot terminate a stuck process. + /// + /// The ticket of the script to abandon + /// Used to output user orientated log messages + /// Cancels the RPC call + /// The current status snapshot of the script at the time abandon was processed + Task AbandonScript(ScriptTicket scriptTicket, ITentacleClientTaskLog logger, CancellationToken cancellationToken); + /// /// Complete the script. /// diff --git a/source/Octopus.Tentacle.Client/TentacleClient.cs b/source/Octopus.Tentacle.Client/TentacleClient.cs index 8176cccc0..cf3d60ef0 100644 --- a/source/Octopus.Tentacle.Client/TentacleClient.cs +++ b/source/Octopus.Tentacle.Client/TentacleClient.cs @@ -16,6 +16,7 @@ using Octopus.Tentacle.Contracts.Capabilities; using Octopus.Tentacle.Contracts.Logging; using Octopus.Tentacle.Contracts.Observability; +using Octopus.Tentacle.Contracts.ScriptServiceV2; using ITentacleClientObserver = Octopus.Tentacle.Contracts.Observability.ITentacleClientObserver; namespace Octopus.Tentacle.Client @@ -260,6 +261,36 @@ public async Task CancelScript(CommandContext co return await scriptExecutor.CancelScript(commandContext); } + public async Task AbandonScript(ScriptTicket scriptTicket, ITentacleClientTaskLog logger, CancellationToken cancellationToken) + { + using var activity = ActivitySource.StartActivity($"{nameof(TentacleClient)}.{nameof(AbandonScript)}"); + activity?.AddTag("octopus.tentacle.script.ticket", scriptTicket.TaskId); + + var operationMetricsBuilder = ClientOperationMetricsBuilder.Start(); + + async Task AbandonScriptAction(CancellationToken ct) + { + var request = new AbandonScriptCommandV2(scriptTicket, lastLogSequence: 0); + return await allClients.ScriptServiceV2.AbandonScriptAsync(request, new HalibutProxyRequestOptions(ct)); + } + + try + { + return await rpcCallExecutor.Execute( + retriesEnabled: clientOptions.RpcRetrySettings.RetriesEnabled, + RpcCall.Create(nameof(IScriptServiceV2.AbandonScript)), + AbandonScriptAction, + logger, + operationMetricsBuilder, + cancellationToken).ConfigureAwait(false); + } + catch (Exception e) + { + operationMetricsBuilder.Failure(e, cancellationToken); + throw; + } + } + public async Task CompleteScript(CommandContext commandContext, ITentacleClientTaskLog logger, CancellationToken scriptExecutionCancellationToken) { using var activity = ActivitySource.StartActivity($"{nameof(TentacleClient)}.{nameof(CompleteScript)}"); diff --git a/source/Octopus.Tentacle.Contracts/ClientServices/IAsyncClientScriptServiceV2.cs b/source/Octopus.Tentacle.Contracts/ClientServices/IAsyncClientScriptServiceV2.cs index 996f915c1..0c89048e4 100644 --- a/source/Octopus.Tentacle.Contracts/ClientServices/IAsyncClientScriptServiceV2.cs +++ b/source/Octopus.Tentacle.Contracts/ClientServices/IAsyncClientScriptServiceV2.cs @@ -10,6 +10,7 @@ public interface IAsyncClientScriptServiceV2 Task StartScriptAsync(StartScriptCommandV2 command, HalibutProxyRequestOptions proxyRequestOptions); Task GetStatusAsync(ScriptStatusRequestV2 request, HalibutProxyRequestOptions proxyRequestOptions); Task CancelScriptAsync(CancelScriptCommandV2 command, HalibutProxyRequestOptions proxyRequestOptions); + Task AbandonScriptAsync(AbandonScriptCommandV2 command, HalibutProxyRequestOptions proxyRequestOptions); Task CompleteScriptAsync(CompleteScriptCommandV2 command, HalibutProxyRequestOptions proxyRequestOptions); } } \ No newline at end of file diff --git a/source/Octopus.Tentacle.Contracts/ScriptExitCodes.cs b/source/Octopus.Tentacle.Contracts/ScriptExitCodes.cs index 2e0ce1b15..2319410c4 100644 --- a/source/Octopus.Tentacle.Contracts/ScriptExitCodes.cs +++ b/source/Octopus.Tentacle.Contracts/ScriptExitCodes.cs @@ -12,6 +12,7 @@ public static class ScriptExitCodes public const int UnknownScriptExitCode = -45; public const int UnknownResultExitCode = -46; public const int PowerShellNeverStartedExitCode = -47; + public const int AbandonedExitCode = -48; //Kubernetes Agent public const int KubernetesScriptPodNotFound = -81; diff --git a/source/Octopus.Tentacle.Contracts/ScriptServiceV2/AbandonScriptCommandV2.cs b/source/Octopus.Tentacle.Contracts/ScriptServiceV2/AbandonScriptCommandV2.cs new file mode 100644 index 000000000..66efba446 --- /dev/null +++ b/source/Octopus.Tentacle.Contracts/ScriptServiceV2/AbandonScriptCommandV2.cs @@ -0,0 +1,17 @@ +using System; + +namespace Octopus.Tentacle.Contracts.ScriptServiceV2 +{ + public class AbandonScriptCommandV2 + { + public AbandonScriptCommandV2(ScriptTicket ticket, long lastLogSequence) + { + Ticket = ticket; + LastLogSequence = lastLogSequence; + } + + public ScriptTicket Ticket { get; } + + public long LastLogSequence { get; } + } +} diff --git a/source/Octopus.Tentacle.Contracts/ScriptServiceV2/IScriptServiceV2.cs b/source/Octopus.Tentacle.Contracts/ScriptServiceV2/IScriptServiceV2.cs index 3effc17b8..9858111fa 100644 --- a/source/Octopus.Tentacle.Contracts/ScriptServiceV2/IScriptServiceV2.cs +++ b/source/Octopus.Tentacle.Contracts/ScriptServiceV2/IScriptServiceV2.cs @@ -7,6 +7,7 @@ public interface IScriptServiceV2 ScriptStatusResponseV2 StartScript(StartScriptCommandV2 command); ScriptStatusResponseV2 GetStatus(ScriptStatusRequestV2 request); ScriptStatusResponseV2 CancelScript(CancelScriptCommandV2 command); + ScriptStatusResponseV2 AbandonScript(AbandonScriptCommandV2 command); void CompleteScript(CompleteScriptCommandV2 command); } } \ No newline at end of file diff --git a/source/Octopus.Tentacle.Core/Services/Scripts/RunningScript.cs b/source/Octopus.Tentacle.Core/Services/Scripts/RunningScript.cs index fca5dfa15..7109d0ff1 100644 --- a/source/Octopus.Tentacle.Core/Services/Scripts/RunningScript.cs +++ b/source/Octopus.Tentacle.Core/Services/Scripts/RunningScript.cs @@ -22,18 +22,20 @@ public class RunningScript: IRunningScript readonly IShell shell; readonly string taskId; readonly CancellationToken runningScriptToken; + readonly CancellationToken abandonToken; readonly IReadOnlyDictionary environmentVariables; readonly ILog log; readonly ScriptIsolationMutex scriptIsolationMutex; readonly TimeSpan powerShellStartupTimeout; - public RunningScript(IShell shell, + RunningScript(IShell shell, IScriptWorkspace workspace, IScriptStateStore? stateStore, IScriptLog scriptLog, string taskId, ScriptIsolationMutex scriptIsolationMutex, CancellationToken runningScriptToken, + CancellationToken abandonToken, IReadOnlyDictionary environmentVariables, TimeSpan powerShellStartupTimeout, ILog log @@ -44,6 +46,7 @@ ILog log this.stateStore = stateStore; this.taskId = taskId; this.runningScriptToken = runningScriptToken; + this.abandonToken = abandonToken; this.environmentVariables = environmentVariables; this.log = log; this.scriptIsolationMutex = scriptIsolationMutex; @@ -52,7 +55,7 @@ ILog log this.powerShellStartupTimeout = powerShellStartupTimeout; } - public RunningScript(IShell shell, + RunningScript(IShell shell, IScriptWorkspace workspace, IScriptLog scriptLog, string taskId, @@ -60,10 +63,34 @@ public RunningScript(IShell shell, CancellationToken runningScriptToken, IReadOnlyDictionary environmentVariables, TimeSpan powerShellStartupTimeout, - ILog log) : this(shell, workspace, null, scriptLog, taskId, scriptIsolationMutex, runningScriptToken, environmentVariables, powerShellStartupTimeout, log) + ILog log) : this(shell, workspace, null, scriptLog, taskId, scriptIsolationMutex, runningScriptToken, CancellationToken.None, environmentVariables, powerShellStartupTimeout, log) { } + public static RunningScript Create(IShell shell, + IScriptWorkspace workspace, + IScriptLog scriptLog, + string taskId, + ScriptIsolationMutex scriptIsolationMutex, + CancellationToken runningScriptToken, + IReadOnlyDictionary environmentVariables, + TimeSpan powerShellStartupTimeout, + ILog log) + => new RunningScript(shell, workspace, null, scriptLog, taskId, scriptIsolationMutex, runningScriptToken, CancellationToken.None, environmentVariables, powerShellStartupTimeout, log); + + public static RunningScript CreateAbandonable(IShell shell, + IScriptWorkspace workspace, + IScriptStateStore? stateStore, + IScriptLog scriptLog, + string taskId, + ScriptIsolationMutex scriptIsolationMutex, + CancellationToken runningScriptToken, + CancellationToken abandonToken, + IReadOnlyDictionary environmentVariables, + TimeSpan powerShellStartupTimeout, + ILog log) + => new RunningScript(shell, workspace, stateStore, scriptLog, taskId, scriptIsolationMutex, runningScriptToken, abandonToken, environmentVariables, powerShellStartupTimeout, log); + public ProcessState State { get; private set; } public int ExitCode { get; private set; } @@ -96,14 +123,22 @@ public async Task Execute() exitCode = workspace.ShouldMonitorPowerShellStartup() ? await RunPowershellScriptWithMonitoring(shellPath, writer, runningScriptToken) - : RunScript(shellPath, writer, runningScriptToken); + : await RunScriptAsync(shellPath, writer, runningScriptToken, abandonToken); } } + // Fires when the caller abandoned the script: leave the OS process running and signal the distinct AbandonedExitCode so the server can tell it apart from a cancel. + catch (OperationCanceledException) when (abandonToken.IsCancellationRequested) + { + writer.WriteOutput(ProcessOutputSource.StdOut, "Script execution abandoned."); + exitCode = ScriptExitCodes.AbandonedExitCode; + } + // Fires when the caller cancelled the script and the underlying process honored the cancellation token. catch (OperationCanceledException) { writer.WriteOutput(ProcessOutputSource.StdOut, "Script execution canceled."); exitCode = ScriptExitCodes.CanceledExitCode; } + // Fires when acquiring the isolation mutex timed out before the script could start. catch (TimeoutException) { writer.WriteOutput(ProcessOutputSource.StdOut, "Script execution timed out."); @@ -147,7 +182,7 @@ async Task RunPowershellScriptWithMonitoring(string shellPath, IScriptLogWr var monitor = new PowerShellStartupMonitor(workspace.WorkingDirectory, powerShellStartupTimeout, log, taskId); var monitoringTask = monitor.WaitForStartup(monitoringTaskCts.Token); - var scriptTask = Task.Run(() => RunScript(shellPath, writer, scriptTaskCts.Token), scriptTaskCts.Token); + var scriptTask = Task.Run(async () => await RunScriptAsync(shellPath, writer, scriptTaskCts.Token, abandonToken), scriptTaskCts.Token); var completedTask = await Task.WhenAny(monitoringTask, scriptTask); @@ -222,11 +257,11 @@ void RecordScriptHasCompleted(int exitCode) } } - int RunScript(string shellPath, IScriptLogWriter writer, CancellationToken cancellationToken) + async Task RunScriptAsync(string shellPath, IScriptLogWriter writer, CancellationToken cancellationToken, CancellationToken abandon) { try { - var exitCode = SilentProcessRunner.ExecuteCommand( + var exitCode = await SilentProcessRunner.ExecuteCommandAsync( shellPath, shell.FormatCommandArguments(workspace.BootstrapScriptFilePath, workspace.ScriptArguments, false), workspace.WorkingDirectory, @@ -234,7 +269,8 @@ int RunScript(string shellPath, IScriptLogWriter writer, CancellationToken cance LogScriptOutputTo(writer, ProcessOutputSource.StdOut), LogScriptOutputTo(writer, ProcessOutputSource.StdErr), environmentVariables, - cancellationToken); + cancellationToken, + abandon); return exitCode; } diff --git a/source/Octopus.Tentacle.Core/Services/Scripts/ScriptServiceV2.cs b/source/Octopus.Tentacle.Core/Services/Scripts/ScriptServiceV2.cs index 0a200fc19..5f7c0a262 100644 --- a/source/Octopus.Tentacle.Core/Services/Scripts/ScriptServiceV2.cs +++ b/source/Octopus.Tentacle.Core/Services/Scripts/ScriptServiceV2.cs @@ -72,7 +72,7 @@ public async Task StartScriptAsync(StartScriptCommandV2 { IScriptWorkspace workspace; - // If the state already exists then this runningScript is already running/has already run and we should not run it again + // StartScript may be called multiple times for the same ticket (e.g. if server retries the tentacle command), so we must guard against actually starting the script twice. if (runningScript.ScriptStateStore.Exists()) { var state = runningScript.ScriptStateStore.Load(); @@ -103,7 +103,8 @@ public async Task StartScriptAsync(StartScriptCommandV2 command.TaskId, workspace, runningScript.ScriptStateStore, - runningScript.CancellationToken); + runningScript.CancellationToken, + runningScript.AbandonToken); runningScript.Process = process; @@ -140,22 +141,66 @@ public async Task CancelScriptAsync(CancelScriptCommandV return GetResponse(command.Ticket, command.LastLogSequence, runningScript?.Process); } - public async Task CompleteScriptAsync(CompleteScriptCommandV2 command, CancellationToken cancellationToken) + public Task AbandonScriptAsync(AbandonScriptCommandV2 command, CancellationToken cancellationToken) { - await Task.CompletedTask; + // Triggers the abandon token so `process.WaitForExitAsync` will return in + // SilentProcessRunning.ExecuteAsync which means the call to GetResponse() + // below may have the final exit code for the script. Otherwise the sender of + // the command (Octopus Server) will get the result on a subsequent call to `GetStatus` + if (runningScripts.TryGetValue(command.Ticket, out var runningScript)) + { + runningScript.Abandon(); + } + + return Task.FromResult(GetResponse(command.Ticket, command.LastLogSequence, runningScript?.Process)); + } + public async Task CompleteScriptAsync(CompleteScriptCommandV2 command, CancellationToken cancellationToken) + { if (runningScripts.TryRemove(command.Ticket, out var runningScript)) { runningScript.Dispose(); } var workspace = workspaceFactory.GetWorkspace(command.Ticket, WorkspaceReadinessCheck.Skip); - await workspace.Delete(cancellationToken); + + // For abandoned scripts (see AbandonScriptCommandV2 and + // https://octopus.com/docs/infrastructure/deployment-targets/tentacle/tentacle-script-abandonment) + // the underlying OS process is, by design, still alive + // and unable to be killed by Tentacle. It may still hold open file handles inside + // the workspace (logs being written to, working files, etc.). workspace.Delete() + // will fail in that case on Windows due to sharing violations and may partially + // delete on Linux. We need to tolerate the failure, which will leave the workspace + // on disk to hopefully be cleaned up by another mechanism (manual cleanup, + // instance restart) etc. This is the best we can do. For all other completion paths + // the process has exited and Delete should succeed; surface any failure there. + if (WasAbandoned(workspace)) + { + try + { + await workspace.Delete(cancellationToken); + } + catch (Exception ex) + { + log.Warn(ex, $"Could not delete abandoned workspace at {workspace.WorkingDirectory}. Leaving on disk; the underlying script process may still hold open file handles."); + } + } + else + { + await workspace.Delete(cancellationToken); + } } - RunningScript LaunchShell(ScriptTicket ticket, string serverTaskId, IScriptWorkspace workspace, IScriptStateStore stateStore, CancellationToken cancellationToken) + bool WasAbandoned(IScriptWorkspace workspace) { - var runningScript = new RunningScript(shell, workspace, stateStore, workspace.CreateLog(), serverTaskId, scriptIsolationMutex, cancellationToken, environmentVariables, powerShellStartupTimeout, log); + var stateStore = scriptStateStoreFactory.Create(workspace); + return stateStore.Exists() + && stateStore.Load().ExitCode == ScriptExitCodes.AbandonedExitCode; + } + + RunningScript LaunchShell(ScriptTicket ticket, string serverTaskId, IScriptWorkspace workspace, IScriptStateStore stateStore, CancellationToken cancellationToken, CancellationToken abandonToken) + { + var runningScript = RunningScript.CreateAbandonable(shell, workspace, stateStore, workspace.CreateLog(), serverTaskId, scriptIsolationMutex, cancellationToken, abandonToken, environmentVariables, powerShellStartupTimeout, log); _ = Task.Run(async () => await runningScript.Execute()); return runningScript; } @@ -204,13 +249,14 @@ public bool IsRunningScript(ScriptTicket ticket) class RunningScriptWrapper : IDisposable { - readonly CancellationTokenSource cancellationTokenSource = new (); + readonly CancellationTokenSource cancellationTokenSource = new(); + readonly CancellationTokenSource abandonTokenSource = new(); public RunningScriptWrapper(ScriptStateStore scriptStateStore) { ScriptStateStore = scriptStateStore; - CancellationToken = cancellationTokenSource.Token; + AbandonToken = abandonTokenSource.Token; } public RunningScript? Process { get; set; } @@ -218,15 +264,15 @@ public RunningScriptWrapper(ScriptStateStore scriptStateStore) public SemaphoreSlim StartScriptMutex { get; } = new(1, 1); public CancellationToken CancellationToken { get; } + public CancellationToken AbandonToken { get; } - public void Cancel() - { - cancellationTokenSource.Cancel(); - } + public void Cancel() => cancellationTokenSource.Cancel(); + public void Abandon() => abandonTokenSource.Cancel(); public void Dispose() { cancellationTokenSource.Dispose(); + abandonTokenSource.Dispose(); } } } diff --git a/source/Octopus.Tentacle.Core/Util/CommandLine/SilentProcessRunner.cs b/source/Octopus.Tentacle.Core/Util/CommandLine/SilentProcessRunner.cs index 0fcf47f9e..493e33e85 100644 --- a/source/Octopus.Tentacle.Core/Util/CommandLine/SilentProcessRunner.cs +++ b/source/Octopus.Tentacle.Core/Util/CommandLine/SilentProcessRunner.cs @@ -8,25 +8,33 @@ using System.Runtime.Versioning; using System.Text; using System.Threading; +using System.Threading.Tasks; +using Octopus.Tentacle.Contracts; using Octopus.Tentacle.Core.Diagnostics; +using Octopus.Tentacle.Core.Util; namespace Octopus.Tentacle.Util { public static class SilentProcessRunner { - public static int ExecuteCommand( + // How long we wait for an issued kill to actually reap the process before we + // give up and report the script as abandoned. + const int CancelKillGraceMilliseconds = 5000; + + public static Task ExecuteCommandAsync( string executable, string arguments, string workingDirectory, Action debug, Action info, Action error, - CancellationToken cancel) + CancellationToken cancel, + CancellationToken abandon) { - return ExecuteCommand(executable, arguments, workingDirectory, debug, info, error, customEnvironmentVariables: null, cancel: cancel); + return ExecuteCommandAsync(executable, arguments, workingDirectory, debug, info, error, customEnvironmentVariables: null, cancel: cancel, abandon: abandon); } - public static int ExecuteCommand( + public static async Task ExecuteCommandAsync( string executable, string arguments, string workingDirectory, @@ -34,7 +42,8 @@ public static int ExecuteCommand( Action info, Action error, IReadOnlyDictionary? customEnvironmentVariables = null, - CancellationToken cancel = default) + CancellationToken cancel = default, + CancellationToken abandon = default) { if (executable == null) throw new ArgumentNullException(nameof(executable)); @@ -135,15 +144,69 @@ void WriteData(Action action, ManualResetEventSlim resetEvent, DataRecei })) { if (cancel.IsCancellationRequested) - DoOurBestToCleanUp(process, error); + DoOurBestToCleanUp(process, error); process.BeginOutputReadLine(); process.BeginErrorReadLine(); - process.WaitForExit(); + // Process.WaitForExitAsync waits for stream EOF after the Exited event + // fires. A re-parented grandchild that inherits our redirected stdout/stderr + // pipes will hold them open and prevent EOF, hanging the await forever. We + // can't add process.Close() to cancel cleanup (it clears the Process object's + // stream fields and aborts the in-flight await, breaking other tests). So + // instead we pass a linked token combining `abandon` and `cancel`: when + // either fires, the await throws OCE and we route to the matching catch. + using var stopWaiting = CancellationTokenSource.CreateLinkedTokenSource(abandon, cancel); - SafelyCancelRead(process.CancelErrorRead, debug); - SafelyCancelRead(process.CancelOutputRead, debug); + try + { + await WaitForProcessExitAsync(process, stopWaiting.Token).ConfigureAwait(false); + } + catch (OperationCanceledException) when (abandon.IsCancellationRequested) + { + // Abandon path. From the user's perspective abandon is a race against + // natural script exit, so returning AbandonedExitCode is acceptable even + // if the process happened to finish at the same moment — that's why we + // don't check process.HasExited here. + info("Tentacle has abandoned this script. The underlying script process may still be running on this host."); + SafelyCancelOutputAndErrorRead(process, debug); + running = false; + return ScriptExitCodes.AbandonedExitCode; + } + catch (OperationCanceledException) when (cancel.IsCancellationRequested) + { + // Cancel means "kill it". DoOurBestToCleanUp already issued the kill via + // cancel.Register, but Kill() only *requests* termination — the OS reaps the + // process asynchronously, so it is usually still alive at this point. We wait a + // bounded grace period for the reap so the fall-through below can read the real + // exit code (e.g. 137 for SIGKILL, 143 for SIGTERM). + // Required by CancellationToken_ShouldForceKillTheProcess. + // + // The finite-timeout overload is deliberate: WaitForExit(int) waits only for + // termination, never for redirected-stream EOF (the EOF drain is guarded by + // `milliseconds == Timeout.Infinite`). The no-arg overload WOULD drain EOF and + // hang when a re-parented grandchild holds our pipes open. + // Required by CancellationToken_WhenGrandchildHoldsRedirectedPipes_ShouldNotHang. + var exitedWithinGracePeriod = process.WaitForExit(CancelKillGraceMilliseconds); + + if (!exitedWithinGracePeriod) + { + // The kill did not land within the grace period: the process is genuinely + // stuck (or kill was disabled in a test). There is no real exit code to read + // — reading process.ExitCode here would throw "Process must exit before + // requested information can be determined." So we report it as abandoned; + // the process may still be running on this host. + // Required by AbandonScript_WhenCancelFailsToKillProcess_ReturnsAbandonedExitCode. + info("Tentacle stopped waiting for the cancelled script. The underlying script process may still be running on this host."); + SafelyCancelOutputAndErrorRead(process, debug); + running = false; + return ScriptExitCodes.AbandonedExitCode; + } + + // Exited within the grace period — fall through to read the real exit code below. + } + + SafelyCancelOutputAndErrorRead(process, debug); SafelyWaitForAllOutput(outputResetEvent, cancel, debug); SafelyWaitForAllOutput(errorResetEvent, cancel, debug); @@ -180,9 +243,20 @@ static void SafelyWaitForAllOutput(ManualResetEventSlim outputResetEvent, CancellationToken cancel, Action debug) { + // outputResetEvent.Wait is waiting for the OutputDataReceived/ErrorDataReceived + // handlers to signal EOF on the stream (when it receives a null + // DataReceivedEventArgs.Data, .NET's EOF marker). This does NOT close the pipe, + // it just gives the OS up to 5 seconds to deliver the EOF. + // + // If a re-parented grandchild is holding the pipe open, EOF never arrives, the wait + // times out, and we proceed without the final flush of buffered output. The pipe is + // released later by Process.Dispose() at end of ExecuteCommandAsync via the + // `using (var process = new Process())` block. + // + // 5 seconds is somewhat arbitrary — the process has already exited by the time we + // reach here, so under normal circumstances EOF arrives within milliseconds. try { - //5 seconds is a bit arbitrary, but the process should have already exited by now, so unwise to wait too long outputResetEvent.Wait(TimeSpan.FromSeconds(5), cancel); } catch (OperationCanceledException ex) @@ -191,6 +265,17 @@ static void SafelyWaitForAllOutput(ManualResetEventSlim outputResetEvent, } } + static void SafelyCancelOutputAndErrorRead(Process process, Action debug) + { + // Cancel the output/error readers so a late OutputDataReceived/ErrorDataReceived + // callback doesn't try to write to a workspace log that's already been disposed by + // the using-block above; that write would throw ObjectDisposedException. Called in + // both the normal completion path and the abandon path; extracted here so the two + // callers stay consistent. + SafelyCancelRead(process.CancelErrorRead, debug); + SafelyCancelRead(process.CancelOutputRead, debug); + } + static void SafelyCancelRead(Action action, Action debug) { try @@ -221,22 +306,59 @@ static void DoOurBestToCleanUp(Process process, Action error) error($"Failed to kill the launched process: {killProcessException}"); } } - finally + } + + // Single place we block waiting for the spawned process to exit. + // On .NET Framework we use a TaskCompletionSource polyfill because + // Process.WaitForExitAsync doesn't exist there; on .NET 8+ we use the + // framework method directly. + static Task WaitForProcessExitAsync(Process process, CancellationToken cancellationToken) + { +#if NETFRAMEWORK + return WaitForProcessExitAsyncNetFrameworkPolyfill(process, cancellationToken); +#else + return process.WaitForExitAsync(cancellationToken); +#endif + } + +#if NETFRAMEWORK + static Task WaitForProcessExitAsyncNetFrameworkPolyfill(Process process, CancellationToken cancellationToken) + { + // EnableRaisingEvents must be true for the process.Exited handler below to fire. + // On .NET 8+ Process.WaitForExitAsync sets this itself; here on netframework we + // have to set it ourselves before subscribing. + // https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.process.waitforexitasync + process.EnableRaisingEvents = true; + + var tcs = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously); + CancellationTokenRegistration registration = default; + + void OnExited(object? sender, EventArgs e) { - try - { - // When cancelling, close the file handles. - // If the child finishes, but the grandchild holds stdout/stderr and never completes, - // THEN we won't issue a kill to the grandchild but will wait for the grandchild to - // close the handles. - process.Close(); - } - catch (Exception ex) + registration.Dispose(); + tcs.TrySetResult(null); + } + + process.Exited += OnExited; + + // Guard against race: process may have already exited before we subscribed. + if (process.HasExited) + { + tcs.TrySetResult(null); + } + + if (cancellationToken.CanBeCanceled) + { + registration = cancellationToken.Register(() => { - error($"Failed to close process resources: {ex.Message}"); - } + process.Exited -= OnExited; + tcs.TrySetCanceled(cancellationToken); + }); } + + return tcs.Task; } +#endif [DllImport("kernel32.dll", SetLastError = true)] #pragma warning disable PC003 // Native API not available in UWP @@ -251,11 +373,18 @@ class Hitman { public static void TryKillProcessAndChildrenRecursively(Process process) { + if (!string.IsNullOrEmpty(Environment.GetEnvironmentVariable(EnvironmentVariables.TentacleDebugDisableProcessKill_UNSAFE_FOR_PRODUCTION))) + { + // Test-only no-op: simulate "kill was attempted but didn't terminate the process". + // Only activated when the test harness sets this env var on the Tentacle process. + return; + } + #if NETFRAMEWORK TryKillWindowsProcessAndChildrenRecursively(process.Id); #endif #if !NETFRAMEWORK - // Since .NET Core 3.0 there is support for killing a process and it's children + // Since .NET Core 3.0 there is support for killing a process and it's children process.Kill(true); #endif } diff --git a/source/Octopus.Tentacle.Core/Util/EnvironmentVariables.cs b/source/Octopus.Tentacle.Core/Util/EnvironmentVariables.cs index 2b5f83a59..faf01eb00 100644 --- a/source/Octopus.Tentacle.Core/Util/EnvironmentVariables.cs +++ b/source/Octopus.Tentacle.Core/Util/EnvironmentVariables.cs @@ -29,6 +29,7 @@ public static class EnvironmentVariables public const string TentacleMachineConfigurationHomeDirectory = "TentacleMachineConfigurationHomeDirectory"; public const string TentaclePollingConnectionCount = "TentaclePollingConnectionCount"; public const string TentaclePowerShellStartupTimeout = "TentaclePowerShellStartupTimeout"; + public const string TentacleDebugDisableProcessKill_UNSAFE_FOR_PRODUCTION = "TentacleDebugDisableProcessKill_UNSAFE_FOR_PRODUCTION"; public const string NfsWatchdogDirectory = "watchdog_directory"; public static string TentacleUseTcpNoDelay = "TentacleUseTcpNoDelay"; public static string TentacleUseAsyncListener = "TentacleUseAsyncListener"; diff --git a/source/Octopus.Tentacle.Kubernetes.Tests.Integration/KubernetesAgent/KubernetesClusterOneTimeSetUp.cs b/source/Octopus.Tentacle.Kubernetes.Tests.Integration/KubernetesAgent/KubernetesClusterOneTimeSetUp.cs index 3854e7c53..4c46902c5 100644 --- a/source/Octopus.Tentacle.Kubernetes.Tests.Integration/KubernetesAgent/KubernetesClusterOneTimeSetUp.cs +++ b/source/Octopus.Tentacle.Kubernetes.Tests.Integration/KubernetesAgent/KubernetesClusterOneTimeSetUp.cs @@ -17,7 +17,7 @@ public async Task OneTimeSetUp() installer = new KubernetesClusterInstaller(KubernetesTestsGlobalContext.Instance.TemporaryDirectory, kindExePath, helmExePath, kubeCtlPath, KubernetesTestsGlobalContext.Instance.Logger); await installer.InstallLatestSupported(); - KubernetesTestsGlobalContext.Instance.TentacleImageAndTag = SetupHelpers.GetTentacleImageAndTag(kindExePath, installer); + KubernetesTestsGlobalContext.Instance.TentacleImageAndTag = await SetupHelpers.GetTentacleImageAndTag(kindExePath, installer); KubernetesTestsGlobalContext.Instance.SetToolExePaths(helmExePath, kubeCtlPath); KubernetesTestsGlobalContext.Instance.KubeConfigPath = installer.KubeConfigPath; } diff --git a/source/Octopus.Tentacle.Kubernetes.Tests.Integration/KubernetesClientCompatibilityTests.cs b/source/Octopus.Tentacle.Kubernetes.Tests.Integration/KubernetesClientCompatibilityTests.cs index a0c6ae878..5d060cb32 100644 --- a/source/Octopus.Tentacle.Kubernetes.Tests.Integration/KubernetesClientCompatibilityTests.cs +++ b/source/Octopus.Tentacle.Kubernetes.Tests.Integration/KubernetesClientCompatibilityTests.cs @@ -165,7 +165,7 @@ async Task SetupCluster(ClusterVersion clusterVersion) clusterInstaller = new KubernetesClusterInstaller(testContext.TemporaryDirectory, kindExePath, helmExePath, kubeCtlPath, testContext.Logger); await clusterInstaller.Install(clusterVersion); - testContext.TentacleImageAndTag = SetupHelpers.GetTentacleImageAndTag(kindExePath, clusterInstaller); + testContext.TentacleImageAndTag = await SetupHelpers.GetTentacleImageAndTag(kindExePath, clusterInstaller); testContext.SetToolExePaths(helmExePath, kubeCtlPath); testContext.KubeConfigPath = clusterInstaller.KubeConfigPath; } diff --git a/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/DockerImageLoader.cs b/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/DockerImageLoader.cs index cf28823aa..9c7fa6412 100644 --- a/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/DockerImageLoader.cs +++ b/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/DockerImageLoader.cs @@ -18,16 +18,16 @@ public DockerImageLoader(TemporaryDirectory temporaryDirectory, ILogger logger, this.kindExePath = kindExePath; } - public string? LoadMostRecentImageIntoKind(string clusterName) + public async Task LoadMostRecentImageIntoKind(string clusterName) { - var mostRecentTag = FindMostRecentTag(); + var mostRecentTag = await FindMostRecentTag(); return !string.IsNullOrWhiteSpace(mostRecentTag) - ? LoadImageIntoKind(mostRecentTag, clusterName) + ? await LoadImageIntoKind(mostRecentTag, clusterName) : null; } - string? FindMostRecentTag() + async Task FindMostRecentTag() { var sb = new StringBuilder(); var tags = new List(); @@ -36,7 +36,7 @@ public DockerImageLoader(TemporaryDirectory temporaryDirectory, ILogger logger, .WriteTo.StringBuilder(sb) .CreateLogger(); - var exitCode = SilentProcessRunner.ExecuteCommand( + var exitCode = await SilentProcessRunner.ExecuteCommandAsync( "docker", "images octopusdeploy/kubernetes-agent-tentacle --format \"{{.Tag}}\"", temporaryDirectory.DirectoryPath, @@ -46,8 +46,7 @@ public DockerImageLoader(TemporaryDirectory temporaryDirectory, ILogger logger, sprLogger.Information(line); tags.Add(line); }, - sprLogger.Error, - CancellationToken.None + sprLogger.Error ); if (exitCode != 0) @@ -55,11 +54,11 @@ public DockerImageLoader(TemporaryDirectory temporaryDirectory, ILogger logger, logger.Error("Failed to get latest image tag from docker"); throw new InvalidOperationException($"Failed to get latest image tag from docker. Logs: {sb}"); } - + return tags.FirstOrDefault(); } - string LoadImageIntoKind(string mostRecentTag, string clusterName) + async Task LoadImageIntoKind(string mostRecentTag, string clusterName) { var image = $"octopusdeploy/kubernetes-agent-tentacle:{mostRecentTag}"; @@ -69,14 +68,13 @@ string LoadImageIntoKind(string mostRecentTag, string clusterName) .WriteTo.StringBuilder(sb) .CreateLogger(); - var exitCode = SilentProcessRunner.ExecuteCommand( + var exitCode = await SilentProcessRunner.ExecuteCommandAsync( kindExePath, $"load docker-image {image} --name={clusterName}", temporaryDirectory.DirectoryPath, sprLogger.Debug, sprLogger.Information, - sprLogger.Error, - CancellationToken.None + sprLogger.Error ); if (exitCode != 0) diff --git a/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/KubernetesAgentInstaller.cs b/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/KubernetesAgentInstaller.cs index 79de31e5f..9828a8581 100644 --- a/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/KubernetesAgentInstaller.cs +++ b/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/KubernetesAgentInstaller.cs @@ -56,14 +56,13 @@ public async Task InstallAgent(int listeningPort, string? tentacleImageA .MinimumLevel.Debug() .CreateLogger(); - var exitCode = SilentProcessRunner.ExecuteCommand( + var exitCode = await SilentProcessRunner.ExecuteCommandAsync( helmExePath, arguments, temporaryDirectory.DirectoryPath, sprLogger.Debug, sprLogger.Information, - sprLogger.Error, - CancellationToken.None); + sprLogger.Error); sw.Stop(); @@ -169,7 +168,7 @@ async Task GetAgentThumbprint() .MinimumLevel.Debug() .CreateLogger(); - var exitCode = SilentProcessRunner.ExecuteCommand( + var exitCode = await SilentProcessRunner.ExecuteCommandAsync( kubeCtlExePath, //get the generated thumbprint from the config map $"get cm tentacle-config --namespace {Namespace} --kubeconfig=\"{kubeConfigPath}\" -o \"jsonpath={{.data['Tentacle\\.CertificateThumbprint']}}\"", @@ -180,8 +179,7 @@ async Task GetAgentThumbprint() sprLogger.Information(x); thumbprint = x; }, - sprLogger.Error, - CancellationToken.None); + sprLogger.Error); if (exitCode != 0) { @@ -219,14 +217,21 @@ public void Dispose() NamespaceFlag, AgentName); - var exitCode = SilentProcessRunner.ExecuteCommand( + // Why this is sync: IDisposable.Dispose() must return sync. + // + // Why blocking on the async call is safe: this only runs under NUnit, + // which dispatches us on a worker thread with no SynchronizationContext. + // + // Why low risk: this is test code. The worst case for a wrong call here + // is a hung test, not a production incident. + // See https://blog.stephencleary.com/2012/07/dont-block-on-async-code.html + var exitCode = SilentProcessRunner.ExecuteCommandAsync( helmExePath, uninstallArgs, temporaryDirectory.DirectoryPath, logger.Debug, logger.Information, - logger.Error, - CancellationToken.None); + logger.Error).GetAwaiter().GetResult(); if (exitCode != 0) { diff --git a/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/KubernetesClusterInstaller.cs b/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/KubernetesClusterInstaller.cs index 31b196718..d231d0ff4 100644 --- a/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/KubernetesClusterInstaller.cs +++ b/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/KubernetesClusterInstaller.cs @@ -53,15 +53,14 @@ async Task InstallCluster(ClusterVersion clusterVersion) var sw = new Stopwatch(); sw.Restart(); - var exitCode = SilentProcessRunner.ExecuteCommand( + var exitCode = await SilentProcessRunner.ExecuteCommandAsync( kindExePath, //we give the cluster a unique name $"create cluster --name={clusterName} --config=\"{configFilePath}\" --kubeconfig=\"{kubeConfigName}\"", tempDir.DirectoryPath, logger.Debug, logger.Information, - logger.Error, - CancellationToken.None); + logger.Error); sw.Stop(); @@ -92,15 +91,14 @@ async Task SetLocalhostRouting() .WriteTo.StringBuilder(sb) .CreateLogger(); - var exitCode = SilentProcessRunner.ExecuteCommand( + var exitCode = await SilentProcessRunner.ExecuteCommandAsync( kubeCtlPath, //we give the cluster a unique name $"apply -n default -f \"{manifestFilePath}\" --kubeconfig=\"{KubeConfigPath}\"", tempDir.DirectoryPath, sprLogger.Debug, sprLogger.Information, - sprLogger.Error, - CancellationToken.None); + sprLogger.Error); if (exitCode != 0) { @@ -143,14 +141,13 @@ async Task InstallNfsCsiDriver() .WriteTo.StringBuilder(sb) .CreateLogger(); - var exitCode = SilentProcessRunner.ExecuteCommand( + var exitCode = await SilentProcessRunner.ExecuteCommandAsync( helmExePath, installArgs, tempDir.DirectoryPath, sprLogger.Debug, sprLogger.Information, - sprLogger.Error, - CancellationToken.None); + sprLogger.Error); if (exitCode != 0) { @@ -174,15 +171,19 @@ string BuildNfsCsiDriverInstallArguments() public void Dispose() { - var exitCode = SilentProcessRunner.ExecuteCommand( + // We're in IDisposable.Dispose(). Dispose() must return synchronously, so we + // block on the async call with .GetAwaiter().GetResult(). This is sync-over-async + // but is safe because the NUnit test runner dispatches us on a worker thread + // without a captured SynchronizationContext, so no deadlock. + // See https://blog.stephencleary.com/2012/07/dont-block-on-async-code.html + var exitCode = SilentProcessRunner.ExecuteCommandAsync( kindExePath, //delete the cluster for this test run $"delete cluster --name={clusterName}", tempDir.DirectoryPath, logger.Debug, logger.Information, - logger.Error, - CancellationToken.None); + logger.Error).GetAwaiter().GetResult(); if (exitCode != 0) { diff --git a/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/SetupHelpers.cs b/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/SetupHelpers.cs index f5ed3dca0..5e05de365 100644 --- a/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/SetupHelpers.cs +++ b/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/SetupHelpers.cs @@ -46,7 +46,7 @@ public static TentacleClient BuildTentacleClient(Uri uri, string? thumbprint, Ha builder.Build()); } - public static string? GetTentacleImageAndTag(string kindExePath, KubernetesClusterInstaller clusterInstaller) + public static async Task GetTentacleImageAndTag(string kindExePath, KubernetesClusterInstaller clusterInstaller) { if (clusterInstaller == null) { @@ -68,9 +68,9 @@ public static TentacleClient BuildTentacleClient(Uri uri, string? thumbprint, Ha { //if we should use the latest locally build image, load the tag from docker and load it into kind var imageLoader = new DockerImageLoader(KubernetesTestsGlobalContext.Instance.TemporaryDirectory, KubernetesTestsGlobalContext.Instance.Logger, kindExePath); - imageAndTag = imageLoader.LoadMostRecentImageIntoKind(clusterInstaller.ClusterName); + imageAndTag = await imageLoader.LoadMostRecentImageIntoKind(clusterInstaller.ClusterName); } - + if(imageAndTag is not null) KubernetesTestsGlobalContext.Instance.Logger.Information("Using tentacle image: {ImageAndTag}", imageAndTag); diff --git a/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/Tooling/HelmDownloader.cs b/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/Tooling/HelmDownloader.cs index f28857be9..8c8126f96 100644 --- a/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/Tooling/HelmDownloader.cs +++ b/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/Tooling/HelmDownloader.cs @@ -27,7 +27,7 @@ protected override string BuildDownloadUrl(Architecture processArchitecture, Ope static string GetArchitectureLabel(Architecture processArchitecture) => processArchitecture == Architecture.Arm64 ? "arm64" : "amd64"; - protected override string PostDownload(string targetDirectory, string downloadFilePath, Architecture processArchitecture, OperatingSystem operatingSystem) + protected override async Task PostDownload(string targetDirectory, string downloadFilePath, Architecture processArchitecture, OperatingSystem operatingSystem) { var architecture = GetArchitectureLabel(processArchitecture); var osName = GetOsName(operatingSystem); @@ -43,7 +43,7 @@ protected override string PostDownload(string targetDirectory, string downloadFi else { //everything else is tar.gz - ExtractTarGzip(downloadFilePath, extractionDir); + await ExtractTarGzip(downloadFilePath, extractionDir); } //move the extracted helm executable to the root target directory @@ -66,7 +66,7 @@ static string GetOsName(OperatingSystem operatingSystem) _ => throw new ArgumentOutOfRangeException(nameof(operatingSystem), operatingSystem, null) }; - void ExtractTarGzip(string gzArchiveName, string destFolder) + async Task ExtractTarGzip(string gzArchiveName, string destFolder) { if (!Directory.Exists(destFolder)) { @@ -79,14 +79,13 @@ void ExtractTarGzip(string gzArchiveName, string destFolder) // Falling back to good old fashioned `tar` does the job nicely :) using var tmp = new TemporaryDirectory(); - var exitCode = SilentProcessRunner.ExecuteCommand( + var exitCode = await SilentProcessRunner.ExecuteCommandAsync( "tar", $"xzvf \"{gzArchiveName}\" -C \"{destFolder}\"", tmp.DirectoryPath, Logger.Debug, Logger.Information, - Logger.Error, - CancellationToken.None); + Logger.Error); if (exitCode != 0) { diff --git a/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/Tooling/ToolDownloader.cs b/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/Tooling/ToolDownloader.cs index f89ae9037..b91c4409b 100644 --- a/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/Tooling/ToolDownloader.cs +++ b/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Setup/Tooling/ToolDownloader.cs @@ -37,19 +37,18 @@ public async Task Download(string targetDirectory, CancellationToken can Logger.Information("Downloading {DownloadUrl} to {DownloadFilePath}", downloadUrl, downloadFilePath); await OctopusPackageDownloader.DownloadPackage(downloadUrl, downloadFilePath, Logger, cancellationToken); - downloadFilePath = PostDownload(targetDirectory, downloadFilePath, RuntimeInformation.ProcessArchitecture, os); + downloadFilePath = await PostDownload(targetDirectory, downloadFilePath, RuntimeInformation.ProcessArchitecture, os); //if this is not running on windows, chmod the tool to be executable if (os is not OperatingSystem.Windows) { - var exitCode = SilentProcessRunner.ExecuteCommand( + var exitCode = await SilentProcessRunner.ExecuteCommandAsync( "chmod", $"+x \"{downloadFilePath}\"", targetDirectory, Logger.Debug, Logger.Information, - Logger.Error, - CancellationToken.None); + Logger.Error); if (exitCode != 0) { @@ -62,12 +61,12 @@ public async Task Download(string targetDirectory, CancellationToken can protected abstract string BuildDownloadUrl(Architecture processArchitecture, OperatingSystem operatingSystem); - protected virtual string PostDownload(string downloadDirectory, string downloadFilePath, Architecture processArchitecture, OperatingSystem operatingSystem) + protected virtual Task PostDownload(string downloadDirectory, string downloadFilePath, Architecture processArchitecture, OperatingSystem operatingSystem) { var targetFilename = Path.Combine(downloadDirectory, ExecutableName); File.Move(downloadFilePath, targetFilename); - return targetFilename; + return Task.FromResult(targetFilename); } static OperatingSystem GetOperationSystem() diff --git a/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Tooling/KubeCtlTool.cs b/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Tooling/KubeCtlTool.cs index cae324e54..0b3354159 100644 --- a/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Tooling/KubeCtlTool.cs +++ b/source/Octopus.Tentacle.Kubernetes.Tests.Integration/Tooling/KubeCtlTool.cs @@ -25,10 +25,10 @@ public KubeCtlTool(TemporaryDirectory temporaryDirectory, string kubeCtlExePath, public Task ExecuteNamespacedCommand(string command, CancellationToken cancellationToken = default) { - return Task.Run(() => ExecuteCommand($"{command} --namespace {ns}", cancellationToken), cancellationToken); + return ExecuteCommand($"{command} --namespace {ns}", cancellationToken); } - KubeCtlCommandResult ExecuteCommand(string command, CancellationToken cancellationToken = default) + async Task ExecuteCommand(string command, CancellationToken cancellationToken = default) { var sb = new StringBuilder(); var sprLogger = new LoggerConfiguration() @@ -40,7 +40,7 @@ KubeCtlCommandResult ExecuteCommand(string command, CancellationToken cancellati var stdOut = new List(); var stdErr = new List(); - var exitCode = SilentProcessRunner.ExecuteCommand( + var exitCode = await SilentProcessRunner.ExecuteCommandAsync( kubeCtlExePath, $"{command} --kubeconfig=\"{kubeConfigPath}\"", temporaryDirectory.DirectoryPath, @@ -55,7 +55,7 @@ KubeCtlCommandResult ExecuteCommand(string command, CancellationToken cancellati sprLogger.Error(y); stdErr.Add(y); }, - cancellationToken); + cancel: cancellationToken); return new (exitCode, stdOut, stdErr); } diff --git a/source/Octopus.Tentacle.Tests.Integration.Common/Builders/Decorators/ScriptServiceV2DecoratorBuilder.cs b/source/Octopus.Tentacle.Tests.Integration.Common/Builders/Decorators/ScriptServiceV2DecoratorBuilder.cs index 17b499a06..5cd747cf0 100644 --- a/source/Octopus.Tentacle.Tests.Integration.Common/Builders/Decorators/ScriptServiceV2DecoratorBuilder.cs +++ b/source/Octopus.Tentacle.Tests.Integration.Common/Builders/Decorators/ScriptServiceV2DecoratorBuilder.cs @@ -188,6 +188,11 @@ public async Task CancelScriptAsync(CancelScriptCommandV return await cancelScriptFunc(inner, command, options); } + public async Task AbandonScriptAsync(AbandonScriptCommandV2 command, HalibutProxyRequestOptions options) + { + return await inner.AbandonScriptAsync(command, options); + } + public async Task CompleteScriptAsync(CompleteScriptCommandV2 command, HalibutProxyRequestOptions options) { await completeScriptAction(inner, command, options); diff --git a/source/Octopus.Tentacle.Tests.Integration/CapabilitiesServiceV2Test.cs b/source/Octopus.Tentacle.Tests.Integration/CapabilitiesServiceV2Test.cs index 6f9b7300a..f52ef1b59 100644 --- a/source/Octopus.Tentacle.Tests.Integration/CapabilitiesServiceV2Test.cs +++ b/source/Octopus.Tentacle.Tests.Integration/CapabilitiesServiceV2Test.cs @@ -9,6 +9,7 @@ using Octopus.Tentacle.Contracts.Capabilities; using Octopus.Tentacle.Contracts.KubernetesScriptServiceV1; using Octopus.Tentacle.Contracts.ScriptServiceV2; +using Octopus.Tentacle.Core.Services.Scripts; using Octopus.Tentacle.Tests.Integration.Common.Builders.Decorators; using Octopus.Tentacle.Tests.Integration.Support; using Octopus.Tentacle.Tests.Integration.Util.Builders; @@ -31,15 +32,10 @@ public async Task CapabilitiesFromAnOlderTentacleWhichHasNoCapabilitiesService_W capabilities.Should().Contain(nameof(IScriptService)); capabilities.Should().Contain(nameof(IFileTransferService)); - //all versions have ScriptServiceV1 & IFileTransferService - var expectedCapabilitiesCount = 2; if (version.HasScriptServiceV2()) { capabilities.Should().Contain(nameof(IScriptServiceV2)); - expectedCapabilitiesCount++; } - - capabilities.Count.Should().Be(expectedCapabilitiesCount); } [Test] @@ -55,17 +51,23 @@ public async Task CapabilitiesServiceDoesNotReturnKubernetesScriptServiceForNonK capabilities.Should().Contain(nameof(IScriptService)); capabilities.Should().Contain(nameof(IFileTransferService)); - //all versions have ScriptServiceV1 & IFileTransferService - var expectedCapabilitiesCount = 2; if (version.HasScriptServiceV2()) { capabilities.Should().Contain(nameof(IScriptServiceV2)); - expectedCapabilitiesCount++; } capabilities.Should().NotContain(nameof(IKubernetesScriptServiceV1)); + } + + [Test] + [TentacleConfigurations] + public async Task LatestTentacle_AdvertisesAbandonScriptCapability(TentacleConfigurationTestCase tentacleConfigurationTestCase) + { + await using var clientAndTentacle = await tentacleConfigurationTestCase.CreateLegacyBuilder().Build(CancellationToken); + + var capabilities = (await clientAndTentacle.TentacleClient.CapabilitiesServiceV2.GetCapabilitiesAsync(new(CancellationToken))).SupportedCapabilities; - capabilities.Count.Should().Be(expectedCapabilitiesCount); + capabilities.Should().Contain(nameof(ScriptServiceV2.AbandonScriptAsync)); } [Test] diff --git a/source/Octopus.Tentacle.Tests.Integration/ClientScriptExecutionAbandon.cs b/source/Octopus.Tentacle.Tests.Integration/ClientScriptExecutionAbandon.cs new file mode 100644 index 000000000..715500ebf --- /dev/null +++ b/source/Octopus.Tentacle.Tests.Integration/ClientScriptExecutionAbandon.cs @@ -0,0 +1,131 @@ +using System; +using System.IO; +using System.Linq; +using System.Threading.Tasks; +using FluentAssertions; +using NUnit.Framework; +using Octopus.Tentacle.Contracts; +using Octopus.Tentacle.Core.Util; +using Octopus.Tentacle.Tests.Integration.Support; +using Octopus.Tentacle.Tests.Integration.Util; +using Octopus.Tentacle.Tests.Integration.Util.Builders; + +namespace Octopus.Tentacle.Tests.Integration +{ + [IntegrationTestTimeout] + public class ClientScriptExecutionAbandon : IntegrationTest + { + [Test] + [TentacleConfigurations(scriptServiceToTest: ScriptServiceVersionToTest.Version2)] + public async Task AbandonScript_WhenCancelFailsToKillProcess_ReturnsAbandonedExitCode(TentacleConfigurationTestCase tentacleConfigurationTestCase) + { + // TentacleDebugDisableProcessKill_UNSAFE_FOR_PRODUCTION=1 makes Hitman a no-op, so + // CancelScript cannot actually terminate the underlying script process. The script + // becomes genuinely "stuck" from Tentacle's perspective. AbandonScript should then + // return promptly with AbandonedExitCode without waiting for the process to exit. + await using var clientTentacle = await tentacleConfigurationTestCase.CreateBuilder() + .WithTentacle(x => x.WithRunTentacleEnvironmentVariable(EnvironmentVariables.TentacleDebugDisableProcessKill_UNSAFE_FOR_PRODUCTION, "1")) + .Build(CancellationToken); + + var startFile = Path.Combine(clientTentacle.TemporaryDirectory.DirectoryPath, "start"); + var releaseFile = Path.Combine(clientTentacle.TemporaryDirectory.DirectoryPath, "release"); + + var firstCommand = new TestExecuteShellScriptCommandBuilder() + .SetScriptBody(new ScriptBuilder() + .CreateFile(startFile) + .WaitForFileToExist(releaseFile)) + .WithIsolationLevel(ScriptIsolationLevel.NoIsolation) + .Build(); + + var tentacleClient = clientTentacle.TentacleClient; + + var scriptExecution = Task.Run(async () => await tentacleClient.ExecuteScript(firstCommand, CancellationToken)); + + await Wait.For(() => File.Exists(startFile), + TimeSpan.FromSeconds(30), + () => throw new Exception("Script did not start"), + CancellationToken); + + // Cancel: Hitman is a no-op so the process keeps running. + await tentacleClient.CancelScript(firstCommand.ScriptTicket); + await Task.Delay(TimeSpan.FromSeconds(1)); + + // Abandon: fires the abandon token. The RPC returns the current status snapshot + // immediately, so we poll GetStatus until the script reaches Complete state. + await tentacleClient.AbandonScript(firstCommand.ScriptTicket, CancellationToken); + + ScriptStatus abandonResponse = null!; + await Wait.For(async () => + { + abandonResponse = await tentacleClient.GetStatus(firstCommand.ScriptTicket, CancellationToken); + return abandonResponse.State == ProcessState.Complete; + }, + TimeSpan.FromSeconds(30), + () => throw new Exception("Abandoned script did not reach Complete state within 30s"), + CancellationToken); + abandonResponse.ExitCode.Should().Be(ScriptExitCodes.AbandonedExitCode); + + // Release the script process so it exits cleanly and stops leaking. + File.WriteAllText(releaseFile, ""); + await scriptExecution; + } + + [Test] + [TentacleConfigurations(scriptServiceToTest: ScriptServiceVersionToTest.Version2)] + public async Task AbandonScript_ReleasesIsolationMutexEvenWhileProcessIsStillRunning(TentacleConfigurationTestCase tentacleConfigurationTestCase) + { + // The whole reason Tentacle needs an abandon RPC is to release the isolation mutex + // when CancelScript can't unstick the script. This test proves that contract: a + // FullIsolation script gets stuck (because TentacleDebugDisableProcessKill_UNSAFE_FOR_PRODUCTION + // makes cancel a no-op), abandon is called, and a second FullIsolation script with + // the same mutex name must then be able to acquire the mutex and run. + await using var clientTentacle = await tentacleConfigurationTestCase.CreateBuilder() + .WithTentacle(x => x.WithRunTentacleEnvironmentVariable(EnvironmentVariables.TentacleDebugDisableProcessKill_UNSAFE_FOR_PRODUCTION, "1")) + .Build(CancellationToken); + + var startFile = Path.Combine(clientTentacle.TemporaryDirectory.DirectoryPath, "start"); + var releaseFile = Path.Combine(clientTentacle.TemporaryDirectory.DirectoryPath, "release"); + + const string sharedMutex = "abandon-test-mutex"; + + var firstCommand = new TestExecuteShellScriptCommandBuilder() + .SetScriptBody(new ScriptBuilder() + .CreateFile(startFile) + .WaitForFileToExist(releaseFile)) + .WithIsolationLevel(ScriptIsolationLevel.FullIsolation) + .WithIsolationMutexName(sharedMutex) + .Build(); + + var tentacleClient = clientTentacle.TentacleClient; + + var firstScriptExecution = Task.Run(async () => await tentacleClient.ExecuteScript(firstCommand, CancellationToken)); + + await Wait.For(() => File.Exists(startFile), + TimeSpan.FromSeconds(30), + () => throw new Exception("First script did not start"), + CancellationToken); + + await tentacleClient.CancelScript(firstCommand.ScriptTicket); + await Task.Delay(TimeSpan.FromSeconds(1)); + + await tentacleClient.AbandonScript(firstCommand.ScriptTicket, CancellationToken); + + // Second FullIsolation script with the SAME mutex name. If the abandon released + // the mutex, this script can acquire it and run to completion. Otherwise it would + // block waiting for the (still-alive) first script's mutex hold. + var secondStartFile = Path.Combine(clientTentacle.TemporaryDirectory.DirectoryPath, "second-start"); + var secondCommand = new TestExecuteShellScriptCommandBuilder() + .SetScriptBody(new ScriptBuilder().CreateFile(secondStartFile)) + .WithIsolationLevel(ScriptIsolationLevel.FullIsolation) + .WithIsolationMutexName(sharedMutex) + .Build(); + + var (secondResult, _) = await tentacleClient.ExecuteScript(secondCommand, CancellationToken); + secondResult.ExitCode.Should().Be(0); + File.Exists(secondStartFile).Should().BeTrue("second script should have run after the mutex was released"); + + File.WriteAllText(releaseFile, ""); + await firstScriptExecution; + } + } +} diff --git a/source/Octopus.Tentacle.Tests.Integration/PowerShellStartupDetectionTests.cs b/source/Octopus.Tentacle.Tests.Integration/PowerShellStartupDetectionTests.cs index 814523b42..05fce368e 100644 --- a/source/Octopus.Tentacle.Tests.Integration/PowerShellStartupDetectionTests.cs +++ b/source/Octopus.Tentacle.Tests.Integration/PowerShellStartupDetectionTests.cs @@ -271,14 +271,13 @@ public async Task WhenPowerShellNeverStarts_WeShouldDetectTheScriptDidNotStart_A var args = shell.FormatCommandArguments(workspace.BootstrapScriptFilePath, null, allowInteractive: false); var directOutput = new List(); - var directExitCode = SilentProcessRunner.ExecuteCommand( + var directExitCode = await SilentProcessRunner.ExecuteCommandAsync( shell.GetFullPath(), args, workspace.WorkingDirectory, _ => { }, line => directOutput.Add(line), - line => directOutput.Add(line), - CancellationToken.None); + line => directOutput.Add(line)); var directOutputText = string.Join("\n", directOutput); Logger.Information("Direct invocation output:\n{Output}", directOutputText); @@ -338,14 +337,13 @@ public async Task WhenPowerShellNeverStarts_AndWorkspaceIsDeletedBeforeScriptRun var args = shell.FormatCommandArguments(bootstrapScriptFilePath, null, allowInteractive: false); var directOutput = new List(); - var directExitCode = SilentProcessRunner.ExecuteCommand( + var directExitCode = await SilentProcessRunner.ExecuteCommandAsync( shell.GetFullPath(), args, workspace.WorkingDirectory, _ => { }, line => directOutput.Add(line), - line => directOutput.Add(line), - CancellationToken.None); + line => directOutput.Add(line)); var directOutputText = string.Join("\n", directOutput); Logger.Information("Direct invocation output:\n{Output}", directOutputText); @@ -370,15 +368,16 @@ static IShell GetShellForCurrentPlatform() // First check if pwsh is available try { - var result = SilentProcessRunner.ExecuteCommand( + var result = SilentProcessRunner.ExecuteCommandAsync( "which", "pwsh", Environment.CurrentDirectory, _ => { }, _ => { }, _ => { }, - new Dictionary(), - CancellationToken.None); + customEnvironmentVariables: new Dictionary()) + // Safe: static helper, no synchronisation context. + .GetAwaiter().GetResult(); if (result == 0) { diff --git a/source/Octopus.Tentacle.Tests.Integration/Startup/LinuxConfigureServiceHelperFixture.cs b/source/Octopus.Tentacle.Tests.Integration/Startup/LinuxConfigureServiceHelperFixture.cs index bb8e45aef..7750865fb 100644 --- a/source/Octopus.Tentacle.Tests.Integration/Startup/LinuxConfigureServiceHelperFixture.cs +++ b/source/Octopus.Tentacle.Tests.Integration/Startup/LinuxConfigureServiceHelperFixture.cs @@ -4,6 +4,7 @@ using System.Linq; using System.Reflection; using System.Text; +using System.Threading.Tasks; using FluentAssertions; using NUnit.Framework; using Octopus.Tentacle.CommonTestUtils.Diagnostics; @@ -21,22 +22,22 @@ public class LinuxConfigureServiceHelperFixture { [Test] [RequiresSudoOnLinux] - public void CanInstallServiceAsRoot() + public async Task CanInstallServiceAsRoot() { - CanInstallService(null, null); + await CanInstallService(null, null); } [Test] [RequiresSudoOnLinux] - public void CanInstallServiceAsUser() + public async Task CanInstallServiceAsUser() { var user = new LinuxTestUserPrincipal("octo-shared-svc-test"); - CanInstallService(user.UserName, user.Password); + await CanInstallService(user.UserName, user.Password); } [Test] [RequiresSudoOnLinux] - public void CannotWriteToServiceFileAsUser() + public async Task CannotWriteToServiceFileAsUser() { const string serviceName = "OctopusShared.ServiceHelperTest"; const string instance = "TestCannotWriteToServiceFileInstance"; @@ -47,7 +48,7 @@ public void CannotWriteToServiceFileAsUser() WriteUnixFile(scriptPath); var chmodCmd = new CommandLineInvocation("/bin/bash", $"-c \"chmod 777 {scriptPath}\""); - chmodCmd.ExecuteCommand(); + await chmodCmd.ExecuteCommandAsync(); var configureServiceHelper = new LinuxServiceConfigurator(log); @@ -66,11 +67,11 @@ public void CannotWriteToServiceFileAsUser() serviceConfigurationState); var statCmd = new CommandLineInvocation("/bin/bash", $"-c \"stat -c '%A' /etc/systemd/system/{instance}.service\""); - var result = statCmd.ExecuteCommand(); + var result = await statCmd.ExecuteCommandAsync(); result.Infos.Single().Should().Be("-rw-r--r--"); // Service file should only be writeable for the root user } - void CanInstallService(string username, string password) + async Task CanInstallService(string username, string password) { const string serviceName = "OctopusShared.ServiceHelperTest"; const string instance = "TestInstance"; @@ -81,7 +82,7 @@ void CanInstallService(string username, string password) WriteUnixFile(scriptPath); var commandLineInvocation = new CommandLineInvocation("/bin/bash", $"-c \"chmod 777 {scriptPath}\""); - commandLineInvocation.ExecuteCommand(); + await commandLineInvocation.ExecuteCommandAsync(); var configureServiceHelper = new LinuxServiceConfigurator(log); @@ -100,19 +101,19 @@ void CanInstallService(string username, string password) serviceConfigurationState); //Check that the systemd unit service file has been written - Assert.IsTrue(DoesServiceUnitFileExist(instance), "The service unit file has not been created"); + Assert.IsTrue(await DoesServiceUnitFileExist(instance), "The service unit file has not been created"); - var status = GetServiceStatus(instance); + var status = await GetServiceStatus(instance); status["ActiveState"].Should().Be("active"); status["SubState"].Should().Be("running"); status["LoadState"].Should().Be("loaded"); status["User"].Should().Be(username ?? "root"); //Check that the Service is running - Assert.IsTrue(IsServiceRunning(instance), "The service is not running"); + Assert.IsTrue(await IsServiceRunning(instance), "The service is not running"); //Check that the service is enabled to run on startup - Assert.IsTrue(IsServiceEnabled(instance), "The service has not been enabled to run on startup"); + Assert.IsTrue(await IsServiceEnabled(instance), "The service has not been enabled to run on startup"); var stopServiceConfigurationState = new ServiceConfigurationState { @@ -127,13 +128,13 @@ void CanInstallService(string username, string password) stopServiceConfigurationState); //Check that the Service has stopped - Assert.IsFalse(IsServiceRunning(instance), "The service has not been stopped"); + Assert.IsFalse(await IsServiceRunning(instance), "The service has not been stopped"); //Check that the service is disabled - Assert.IsFalse(IsServiceEnabled(instance), "The service has not been disabled"); + Assert.IsFalse(await IsServiceEnabled(instance), "The service has not been disabled"); //Check that the service is disabled - Assert.IsFalse(DoesServiceUnitFileExist(instance), "The service unit file still exists"); + Assert.IsFalse(await DoesServiceUnitFileExist(instance), "The service unit file still exists"); } void WriteUnixFile(string path) @@ -148,10 +149,10 @@ void WriteUnixFile(string path) } } - Dictionary GetServiceStatus(string serviceName) + async Task> GetServiceStatus(string serviceName) { var commandLineInvocation = new CommandLineInvocation("/bin/bash", $"-c \"systemctl show {serviceName}\""); - var result = commandLineInvocation.ExecuteCommand(); + var result = await commandLineInvocation.ExecuteCommandAsync(); Console.WriteLine($"Status of service {serviceName}"); foreach (var info in result.Infos) Console.WriteLine(info); @@ -160,28 +161,28 @@ Dictionary GetServiceStatus(string serviceName) .ToDictionary(x => x[0], x => x[1]); } - bool IsServiceRunning(string serviceName) + async Task IsServiceRunning(string serviceName) { - var result = RunBashCommand($"systemctl is-active --quiet {serviceName}"); + var result = await RunBashCommand($"systemctl is-active --quiet {serviceName}"); return result.ExitCode == 0; } - bool IsServiceEnabled(string serviceName) + async Task IsServiceEnabled(string serviceName) { - var result = RunBashCommand($"systemctl is-enabled --quiet {serviceName}"); + var result = await RunBashCommand($"systemctl is-enabled --quiet {serviceName}"); return result.ExitCode == 0; } - bool DoesServiceUnitFileExist(string serviceName) + async Task DoesServiceUnitFileExist(string serviceName) { - var result = RunBashCommand($"ls /etc/systemd/system | grep {serviceName}.service"); + var result = await RunBashCommand($"ls /etc/systemd/system | grep {serviceName}.service"); return result.ExitCode == 0; } - CmdResult RunBashCommand(string command) + async Task RunBashCommand(string command) { var commandLineInvocation = new CommandLineInvocation("/bin/bash", $"-c \"{command}\""); - return commandLineInvocation.ExecuteCommand(); + return await commandLineInvocation.ExecuteCommandAsync(); } } } diff --git a/source/Octopus.Tentacle.Tests.Integration/Support/TentacleFetchers/LinuxTentacleFetcher.cs b/source/Octopus.Tentacle.Tests.Integration/Support/TentacleFetchers/LinuxTentacleFetcher.cs index b6613ff5b..2bf480ebd 100644 --- a/source/Octopus.Tentacle.Tests.Integration/Support/TentacleFetchers/LinuxTentacleFetcher.cs +++ b/source/Octopus.Tentacle.Tests.Integration/Support/TentacleFetchers/LinuxTentacleFetcher.cs @@ -44,11 +44,11 @@ async Task DownloadAndExtractFromUrl(string directoryPath, string url) var extractionDirectory = new DirectoryInfo(Path.Combine(directoryPath, "extracted")); - ExtractTarGzip(downloadFilePath, extractionDirectory.FullName, logger); + await ExtractTarGzipAsync(downloadFilePath, extractionDirectory.FullName, logger); return Path.Combine(extractionDirectory.FullName, "tentacle", "Tentacle"); } - public static void ExtractTarGzip(string gzArchiveName, string destFolder, ILogger logger) + public static async Task ExtractTarGzipAsync(string gzArchiveName, string destFolder, ILogger logger) { if (!Directory.Exists(destFolder)) { @@ -62,14 +62,13 @@ public static void ExtractTarGzip(string gzArchiveName, string destFolder, ILogg using var tmp = new TemporaryDirectory(); Action log = s => logger.Information(s); - var exitCode = SilentProcessRunner.ExecuteCommand( + var exitCode = await SilentProcessRunner.ExecuteCommandAsync( "tar", $"xzvf \"{gzArchiveName}\" -C \"{destFolder}\"", tmp.DirectoryPath, log, log, - log, - CancellationToken.None); + log); if (exitCode != 0) { diff --git a/source/Octopus.Tentacle.Tests.Integration/Support/TentacleFetchers/NugetTentacleFetcher.cs b/source/Octopus.Tentacle.Tests.Integration/Support/TentacleFetchers/NugetTentacleFetcher.cs index dc6f3851e..fff2c6bc9 100644 --- a/source/Octopus.Tentacle.Tests.Integration/Support/TentacleFetchers/NugetTentacleFetcher.cs +++ b/source/Octopus.Tentacle.Tests.Integration/Support/TentacleFetchers/NugetTentacleFetcher.cs @@ -61,7 +61,7 @@ async Task DownloadAndExtractFromUrl(string directoryPath, Version versi var tentacleFolder = Path.Combine(directoryPath, "tentacle"); if (tentacleArtifact.EndsWith(".tar.gz")) { - LinuxTentacleFetcher.ExtractTarGzip(tentacleArtifact, tentacleFolder, logger); + await LinuxTentacleFetcher.ExtractTarGzipAsync(tentacleArtifact, tentacleFolder, logger); } else { diff --git a/source/Octopus.Tentacle.Tests.Integration/Support/TestDecoratorsAreCalledInTheCorrectOrder.cs b/source/Octopus.Tentacle.Tests.Integration/Support/TestDecoratorsAreCalledInTheCorrectOrder.cs index 09cbc58fb..fc3fd090b 100644 --- a/source/Octopus.Tentacle.Tests.Integration/Support/TestDecoratorsAreCalledInTheCorrectOrder.cs +++ b/source/Octopus.Tentacle.Tests.Integration/Support/TestDecoratorsAreCalledInTheCorrectOrder.cs @@ -105,6 +105,11 @@ public Task CancelScriptAsync(CancelScriptCommandV2 comm throw new NotImplementedException(); } + public Task AbandonScriptAsync(AbandonScriptCommandV2 command, HalibutProxyRequestOptions proxyRequestOptions) + { + throw new NotImplementedException(); + } + public async Task CompleteScriptAsync(CompleteScriptCommandV2 command, HalibutProxyRequestOptions proxyRequestOptions) { await Task.CompletedTask; diff --git a/source/Octopus.Tentacle.Tests.Integration/Util/Builders/TentacleClientExtensionMethods.cs b/source/Octopus.Tentacle.Tests.Integration/Util/Builders/TentacleClientExtensionMethods.cs index c0975c788..bf6ba7343 100644 --- a/source/Octopus.Tentacle.Tests.Integration/Util/Builders/TentacleClientExtensionMethods.cs +++ b/source/Octopus.Tentacle.Tests.Integration/Util/Builders/TentacleClientExtensionMethods.cs @@ -5,10 +5,12 @@ using System.Threading.Tasks; using Halibut; using Octopus.Tentacle.Client; +using Octopus.Tentacle.Client.EventDriven; using Octopus.Tentacle.Client.Scripts; using Octopus.Tentacle.Client.Scripts.Models; using Octopus.Tentacle.Contracts; using Octopus.Tentacle.Contracts.Logging; +using Octopus.Tentacle.Contracts.ScriptServiceV2; using Octopus.Tentacle.Tests.Integration.Support; using Octopus.Tentacle.Tests.Integration.Support.ExtensionMethods; @@ -56,6 +58,42 @@ public static async Task UploadFile( return result; } + public static async Task AbandonScript( + this TentacleClient tentacleClient, + ScriptTicket scriptTicket, + CancellationToken token, + ITentacleClientTaskLog? log = null) + { + return await tentacleClient.AbandonScript(scriptTicket, + new SerilogLoggerBuilder().Build().ForContext().ToITentacleTaskLog().Chain(log), + token).ConfigureAwait(false); + } + + public static async Task CancelScript( + this TentacleClient tentacleClient, + ScriptTicket scriptTicket, + ITentacleClientTaskLog? log = null) + { + var commandContext = new CommandContext(scriptTicket, 0, ScriptServiceVersion.ScriptServiceVersion2); + var result = await tentacleClient.CancelScript(commandContext, + new SerilogLoggerBuilder().Build().ForContext().ToITentacleTaskLog().Chain(log)) + .ConfigureAwait(false); + return result.ScriptStatus; + } + + public static async Task GetStatus( + this TentacleClient tentacleClient, + ScriptTicket scriptTicket, + CancellationToken token, + ITentacleClientTaskLog? log = null) + { + var commandContext = new CommandContext(scriptTicket, 0, ScriptServiceVersion.ScriptServiceVersion2); + var result = await tentacleClient.GetStatus(commandContext, + new SerilogLoggerBuilder().Build().ForContext().ToITentacleTaskLog().Chain(log), + token).ConfigureAwait(false); + return result.ScriptStatus; + } + public static async Task DownloadFile( this TentacleClient tentacleClient, string remotePath, diff --git a/source/Octopus.Tentacle.Tests.Integration/Util/LinuxTestUserPrincipal.cs b/source/Octopus.Tentacle.Tests.Integration/Util/LinuxTestUserPrincipal.cs index d3f738261..241457316 100644 --- a/source/Octopus.Tentacle.Tests.Integration/Util/LinuxTestUserPrincipal.cs +++ b/source/Octopus.Tentacle.Tests.Integration/Util/LinuxTestUserPrincipal.cs @@ -22,7 +22,13 @@ public LinuxTestUserPrincipal(string username) static void RunCommand(string arguments, bool failOnNonZeroExitCode = true) { var commandLineInvocation = new CommandLineInvocation("/bin/bash", arguments); - var result = commandLineInvocation.ExecuteCommand(); + // We're in a synchronous test helper called from the LinuxTestUserPrincipal + // constructor. Constructors must return synchronously, so we block on the + // async call with .GetAwaiter().GetResult(). This is sync-over-async but is + // safe because the NUnit test runner dispatches us on a worker thread without + // a captured SynchronizationContext, so no deadlock. + // See https://blog.stephencleary.com/2012/07/dont-block-on-async-code.html + var result = commandLineInvocation.ExecuteCommandAsync().GetAwaiter().GetResult(); foreach (var line in result.Errors) Console.WriteLine(line); diff --git a/source/Octopus.Tentacle.Tests.Integration/Util/RunningScriptFixture.cs b/source/Octopus.Tentacle.Tests.Integration/Util/RunningScriptFixture.cs index c0e3b05ad..2c6b9ceec 100644 --- a/source/Octopus.Tentacle.Tests.Integration/Util/RunningScriptFixture.cs +++ b/source/Octopus.Tentacle.Tests.Integration/Util/RunningScriptFixture.cs @@ -67,7 +67,7 @@ public void SetUpLocal() scriptLog = new TestScriptLog(); cancellationTokenSource = new CancellationTokenSource(TimeSpan.FromSeconds(10)); scriptIsolationMutex = new ScriptIsolationMutex(); - runningScript = new RunningScript(shell, + runningScript = RunningScript.Create(shell, workspace, scriptLog, taskId, @@ -169,7 +169,7 @@ public async Task CancellationToken_ShouldKillTheProcess() ? (new PowerShell(), "Start-Sleep -seconds") : (new Bash() as IShell, "sleep"); - var script = new RunningScript(shell, + var script = RunningScript.Create(shell, workspace, scriptLog, taskId, diff --git a/source/Octopus.Tentacle.Tests.Integration/Util/SilentProcessRunnerFixture.cs b/source/Octopus.Tentacle.Tests.Integration/Util/SilentProcessRunnerFixture.cs index da9247fd6..e15802bec 100644 --- a/source/Octopus.Tentacle.Tests.Integration/Util/SilentProcessRunnerFixture.cs +++ b/source/Octopus.Tentacle.Tests.Integration/Util/SilentProcessRunnerFixture.cs @@ -7,6 +7,7 @@ using FluentAssertions; using NUnit.Framework; using Octopus.Tentacle.CommonTestUtils; +using Octopus.Tentacle.Contracts; using Octopus.Tentacle.Tests.Integration.Support; using Octopus.Tentacle.Tests.Integration.Support.TestAttributes; using Octopus.Tentacle.Util; @@ -130,29 +131,45 @@ public async Task CancellationToken_WhenGrandchildHoldsRedirectedPipes_ShouldNot using var tempDir = new TemporaryDirectory(); var grandchildPidFile = Path.Combine(tempDir.DirectoryPath, "grandchild.pid"); - // This test reproduces the cancel-time hang. - // The issue was SilentProcessRunner will wait for stdout/stderr pipes to be closed. - // The pipes can be inherited by grandchildren, and remain open even after the - // child process has died. + // This test guards the cancel path of ExecuteCommandAsync against a regression + // involving re-parented grandchildren that inherit our redirected pipes. // - // Normally Process.Kill(entireProcessTree:true) would kill the entire process tree. - // However if the child process dies THEN we issue the kill command, we do NOT see any - // process under the child and so the Kill() command completes. This leaves the grandchild - // running, holding our pipes we are waiting on. + // The scenario: + // * We launch a process that spawns a child, which spawns a long-running grandchild + // and then immediately exits. + // * Because the child exited BEFORE our cancel fires, the grandchild has been + // re-parented (PPID broken). Process.Kill(entireProcessTree:true) follows PPID + // links, so it does NOT find the grandchild — Kill returns having killed nothing + // beyond the (already-dead) child. + // * The grandchild inherited our redirected stdout/stderr pipes and holds them + // open. The stream readers therefore never see EOF. // - // The test stacks three processes: PowerShell (the child) launches cmd.exe (a - // throwaway middle layer) which does `start /b ping` to background ping (the - // grandchild) and then exits. cmd exiting before we cancel is what breaks the - // PPID chain — without that, ping would still be a direct child of PowerShell - // and Kill(true) would find it. + // Why this is a real risk to ExecuteCommandAsync: + // * Old sync version: process.WaitForExit() blocks until BOTH the process exits + // AND the redirected streams reach EOF, so the grandchild holding pipes would + // hang it forever. The fix was to call process.Close() during cancel-cleanup to + // forcibly release the pipe handles. + // * New async version: WaitForExitAsync does NOT wait on streams — it returns as + // soon as the Exited event fires. SafelyWaitForAllOutput then waits up to 5s + // per stream for EOF and times out if the grandchild still holds them. Pipes + // are released by the using-block's Process.Dispose at end of method. + // * Critically, we deliberately do NOT call process.Close() during cancel-cleanup + // anymore — see DoOurBestToCleanUp in SilentProcessRunner.cs for the full + // explanation. Adding it back caused a 10-minute hang in CI because Close races + // with the Exited event handler that WaitForExitAsync depends on. // - // Two non-obvious bits below, both load-bearing: + // This test asserts cancel returns in well under 30s in the grandchild scenario. + // If it ever takes 10 minutes (the test timeout), someone has re-introduced + // process.Close() or otherwise broken the Exited-event path. + // + // Two non-obvious bits in the PowerShell script below, both load-bearing: // * $psi.RedirectStandardInput = $true — we don't use stdin, but redirecting // any stream is what flips bInheritHandles=true in .NET's Process.Start. That // is what makes cmd (and by extension ping) inherit our pipe write-ends. // Without this the grandchild doesn't hold our pipes and there is no bug to // reproduce. - // * The WMI lookup — we need the grandchild's PID so the test can clean it up. + // * The WMI lookup — we need the grandchild's PID so the test can clean it up + // afterwards (otherwise we'd leak a long-running ping on the CI host). var psScript = @" $pidFile = 'PIDFILE_PLACEHOLDER' $pingPath = Join-Path $env:WINDIR 'System32\PING.EXE' @@ -196,18 +213,21 @@ public async Task CancellationToken_WhenGrandchildHoldsRedirectedPipes_ShouldNot cts.Token)); // Wait for the grandchild to actually be spawned before cancelling - await WaitForGrandchildSpawnAsync(grandchildPidFile, TimeSpan.FromSeconds(60)); + await WaitForPidFileAsync(grandchildPidFile, TimeSpan.FromSeconds(60)); var sw = Stopwatch.StartNew(); cts.Cancel(); - var completed = task.Wait(TimeSpan.FromSeconds(30)); + var completed = task.Wait(TimeSpan.FromSeconds(60)); sw.Stop(); completed.Should().BeTrue( - $"ExecuteCommand should return shortly after cancellation even when a grandchild " + - $"holds the redirected pipes. Without proactively closing the redirected streams " + - $"after Kill, Process.WaitForExit() blocks indefinitely. Elapsed since cancel: {sw.Elapsed.TotalSeconds:F1}s"); + $"ExecuteCommandAsync should return promptly after cancellation even when a " + + $"grandchild holds the redirected pipes. Worst case is ~10s (5s timeout × 2 streams " + + $"in SafelyWaitForAllOutput). If we hit the 60s test timeout, either someone " + + $"re-introduced process.Close() in DoOurBestToCleanUp (which races with the Exited " + + $"event WaitForExitAsync depends on) or SafelyWaitForAllOutput's per-stream timeout " + + $"has been removed. Elapsed since cancel: {sw.Elapsed.TotalSeconds:F1}s"); } } finally @@ -222,19 +242,23 @@ public async Task CancellationToken_WhenUnixGrandchildHoldsRedirectedPipes_Shoul if (PlatformDetection.IsRunningOnWindows) Assert.Ignore("Unix-only repro (Mac/Linux). The Windows equivalent is covered by the [WindowsTest] above."); - // This test reproduces the cancel-time hang. - // The issue is SilentProcessRunner will wait for stdout/stderr pipes to be closed. - // The pipes can be inherited by grandchildren, and remain open even after the - // child process has died. + // Unix equivalent of CancellationToken_WhenGrandchildHoldsRedirectedPipes_ShouldNotHang + // above. See that test's leading comment for the full rationale — the short version is: // - // Normally Process.Kill(entireProcessTree:true) would kill the entire process tree. - // However if the child process dies THEN we issue the kill command, we do NOT see any - // process under the child and so the Kill() command completes. This leaves the grandchild - // running, holding our pipes we are waiting on. + // * We start sh, which backgrounds a long sleep (the grandchild) and exits + // immediately. The grandchild gets re-parented to init/launchd and inherits our + // redirected stdout/stderr pipes, holding them open. + // * Process.Kill(entireProcessTree:true) follows PPID links, so by the time we + // cancel, the now-orphan grandchild is invisible to Kill — it keeps running. + // * Old sync code: process.WaitForExit() hung forever waiting for stream EOF. + // Fix was process.Close() during cancel-cleanup. + // * New async code: WaitForExitAsync ignores streams, so this doesn't hang. + // SafelyWaitForAllOutput bounds the post-await drain to 5s per stream. Pipes + // are released by Process.Dispose at end of method (NOT during cancel-cleanup + // — see DoOurBestToCleanUp in SilentProcessRunner.cs for why adding Close back + // causes a 10-minute hang). // - // The test is simple we run "sh", the child process, and tell it to yeet - // sleep, the grandchild, into the background. The grandchild now keeps - // running holding on to our pipes we will wait for an EOF on. + // This test asserts cancel returns in well under 30s in the grandchild scenario. using var tempDir = new TemporaryDirectory(); var grandchildPidFile = Path.Combine(tempDir.DirectoryPath, "grandchild.pid"); @@ -253,20 +277,27 @@ public async Task CancellationToken_WhenUnixGrandchildHoldsRedirectedPipes_Shoul out _, cts.Token)); - await WaitForGrandchildSpawnAsync(grandchildPidFile, TimeSpan.FromSeconds(30)); + await WaitForPidFileAsync(grandchildPidFile, TimeSpan.FromSeconds(60)); var sw = Stopwatch.StartNew(); cts.Cancel(); - // Cancel should be super quick, if it takes a long time, then we have an issue where we are waiting for the grandchild. - var completed = task.Wait(TimeSpan.FromSeconds(30)); + // Cancel should return within ~10s worst case (5s SafelyWaitForAllOutput timeout + // per stream). The 60s test bound is generous for slow Linux CI agents while + // still flagging a real regression — if process.Close() got re-added to + // DoOurBestToCleanUp (see that method for why) or SafelyWaitForAllOutput's + // per-stream timeout was removed, the wait hangs indefinitely and we trip the + // bound by miles. + var completed = task.Wait(TimeSpan.FromSeconds(60)); sw.Stop(); completed.Should().BeTrue( - $"ExecuteCommand should return shortly after cancellation even when a Unix " + - $"grandchild (reparented to init/launchd) holds the redirected pipes. " + - $"Without proactively closing the redirected streams after Kill, " + - $"Process.WaitForExit() blocks indefinitely. Elapsed since cancel: {sw.Elapsed.TotalSeconds:F1}s"); + $"ExecuteCommandAsync should return promptly after cancellation even when a Unix " + + $"grandchild (reparented to init/launchd) holds the redirected pipes. Worst case " + + $"is ~10s. If we hit the 60s test timeout, either process.Close() was re-introduced " + + $"in DoOurBestToCleanUp (which races with the Exited event WaitForExitAsync depends " + + $"on) or SafelyWaitForAllOutput's per-stream timeout has been removed. Elapsed " + + $"since cancel: {sw.Elapsed.TotalSeconds:F1}s"); } } finally @@ -275,7 +306,64 @@ public async Task CancellationToken_WhenUnixGrandchildHoldsRedirectedPipes_Shoul } } - static async Task WaitForGrandchildSpawnAsync(string pidFile, TimeSpan timeout) + [Test] + public async Task AbandonToken_ShouldReturnAbandonedExitCodeWithoutKillingProcess() + { + using var tempDir = new TemporaryDirectory(); + var pidFile = Path.Combine(tempDir.DirectoryPath, "process.pid"); + + var abandonCommand = PlatformDetection.IsRunningOnWindows ? "powershell.exe" : "/bin/bash"; + var arguments = PlatformDetection.IsRunningOnWindows + ? $"-NoProfile -NonInteractive -Command \"$PID | Out-File -FilePath '{pidFile}' -Encoding ASCII; Start-Sleep -Seconds 300\"" + : $"-c \"echo $$ > '{pidFile}' && sleep 300\""; + + using var cancelCts = new CancellationTokenSource(); + using var abandonCts = new CancellationTokenSource(); + + var infoMessages = new StringBuilder(); + + var task = Task.Run(async () => await SilentProcessRunner.ExecuteCommandAsync( + abandonCommand, + arguments, + Environment.CurrentDirectory, + debug: _ => { }, + info: msg => { lock (infoMessages) infoMessages.AppendLine(msg); }, + error: _ => { }, + customEnvironmentVariables: null, + cancel: cancelCts.Token, + abandon: abandonCts.Token)); + + // Wait deterministically for the process to write its PID before we abandon + await WaitForPidFileAsync(pidFile, TimeSpan.FromSeconds(30)); + abandonCts.Cancel(); + + try + { + var exitCode = await task; + + // AbandonedExitCode is only returned from the abandon catch block, which + // requires the abandon token to fire. If we'd accidentally waited for the + // process to exit naturally, exitCode would be the script's own exit code, + // not this sentinel. The exit code is the abandon contract. + exitCode.Should().Be(ScriptExitCodes.AbandonedExitCode); + infoMessages.ToString().Should().Contain("Tentacle has abandoned this script"); + + // Whether the script keeps running doesn't matter in prod. We check it here so we + // know our fixture successfully prevented it from being killed (the exit code matches either way). + var sleepPid = int.Parse(SafelyReadAllText(pidFile).Trim()); + Process.GetProcessById(sleepPid).HasExited.Should().BeFalse("abandon should leave the underlying script process running"); + } + finally + { + // Force-kill the sleeping process to avoid leaking it on CI + if (File.Exists(pidFile) && int.TryParse(SafelyReadAllText(pidFile).Trim(), out var pid) && pid > 0) + { + try { Process.GetProcessById(pid).Kill(); } catch { /* already gone */ } + } + } + } + + static async Task WaitForPidFileAsync(string pidFile, TimeSpan timeout) { var deadline = DateTime.UtcNow + timeout; while (DateTime.UtcNow < deadline) @@ -285,8 +373,8 @@ static async Task WaitForGrandchildSpawnAsync(string pidFile, TimeSpan timeout) await Task.Delay(50); } throw new TimeoutException( - $"Test setup failed: the grandchild PID was never written to '{pidFile}'. " + - $"The grandchild-pipe scenario is not being exercised."); + $"Test setup failed: a valid PID was never written to '{pidFile}'. " + + $"The scenario under test is not being exercised."); } static string SafelyReadAllText(string path) @@ -418,7 +506,18 @@ static int Execute( var debug = new StringBuilder(); var info = new StringBuilder(); var error = new StringBuilder(); - var exitCode = SilentProcessRunner.ExecuteCommand( + + // Why this is sync: Execute is a test helper that returns int and uses + // out parameters — both force the signature to be sync. It's invoked + // directly from sync NUnit test methods. + // + // Why blocking on the async call is safe: NUnit dispatches us on a + // worker thread with no SynchronizationContext. + // + // Why low risk: this is test code. The worst case for a wrong call here + // is a hung test, not a production incident. + // See https://blog.stephencleary.com/2012/07/dont-block-on-async-code.html + var exitCode = SilentProcessRunner.ExecuteCommandAsync( command, arguments, workingDirectory, @@ -437,7 +536,7 @@ static int Execute( Console.WriteLine($"{DateTime.UtcNow} ERR: {x}"); error.Append(x); }, - cancel); + cancel: cancel).GetAwaiter().GetResult(); debugMessages = debug; infoMessages = info; diff --git a/source/Octopus.Tentacle.Tests/Capabilities/CapabilitiesServiceV2Fixture.cs b/source/Octopus.Tentacle.Tests/Capabilities/CapabilitiesServiceV2Fixture.cs index 0c6326a05..e219353cd 100644 --- a/source/Octopus.Tentacle.Tests/Capabilities/CapabilitiesServiceV2Fixture.cs +++ b/source/Octopus.Tentacle.Tests/Capabilities/CapabilitiesServiceV2Fixture.cs @@ -6,6 +6,7 @@ using Octopus.Tentacle.Contracts; using Octopus.Tentacle.Contracts.KubernetesScriptServiceV1; using Octopus.Tentacle.Contracts.ScriptServiceV2; +using Octopus.Tentacle.Core.Services.Scripts; using Octopus.Tentacle.Kubernetes; using Octopus.Tentacle.Services.Capabilities; @@ -20,8 +21,7 @@ public async Task CapabilitiesAreReturned() .GetCapabilitiesAsync(CancellationToken.None)) .SupportedCapabilities; - capabilities.Should().BeEquivalentTo(nameof(IScriptService), nameof(IFileTransferService), nameof(IScriptServiceV2)); - capabilities.Count.Should().Be(3); + capabilities.Should().BeEquivalentTo(nameof(IScriptService), nameof(IFileTransferService), nameof(IScriptServiceV2), nameof(ScriptServiceV2.AbandonScriptAsync)); capabilities.Should().NotContainMatch("IKubernetesScriptService*"); } @@ -36,11 +36,30 @@ public async Task OnlyKubernetesScriptServicesAreReturnedWhenRunningAsKubernetes .SupportedCapabilities; capabilities.Should().BeEquivalentTo(nameof(IFileTransferService), nameof(IKubernetesScriptServiceV1)); - capabilities.Count.Should().Be(2); capabilities.Should().NotContainMatch("IScriptService*"); Environment.SetEnvironmentVariable(KubernetesConfig.NamespaceVariableName, null); } + + [Test] + public async Task GetCapabilities_OnNonKubernetesTentacle_AdvertisesAbandonScriptV2() + { + var service = new CapabilitiesServiceV2(); + var response = await service.GetCapabilitiesAsync(CancellationToken.None); + response.SupportedCapabilities.Should().Contain(nameof(ScriptServiceV2.AbandonScriptAsync)); + } + + [Test] + public async Task GetCapabilities_OnKubernetesTentacle_DoesNotAdvertiseAbandonScriptV2() + { + Environment.SetEnvironmentVariable(KubernetesConfig.NamespaceVariableName, "ABC"); + + var service = new CapabilitiesServiceV2(); + var response = await service.GetCapabilitiesAsync(CancellationToken.None); + response.SupportedCapabilities.Should().NotContain(nameof(ScriptServiceV2.AbandonScriptAsync)); + + Environment.SetEnvironmentVariable(KubernetesConfig.NamespaceVariableName, null); + } } } \ No newline at end of file diff --git a/source/Octopus.Tentacle.Tests/Integration/ScriptServiceV2Fixture.cs b/source/Octopus.Tentacle.Tests/Integration/ScriptServiceV2Fixture.cs index d5c771eef..9317a874e 100644 --- a/source/Octopus.Tentacle.Tests/Integration/ScriptServiceV2Fixture.cs +++ b/source/Octopus.Tentacle.Tests/Integration/ScriptServiceV2Fixture.cs @@ -1,4 +1,4 @@ -using System; +using System; using System.Collections.Concurrent; using System.Collections.Generic; using System.Diagnostics; @@ -16,6 +16,7 @@ using Octopus.Tentacle.Core.Diagnostics; using Octopus.Tentacle.Core.Services.Scripts; using Octopus.Tentacle.Core.Services.Scripts.Locking; +using Octopus.Tentacle.Core.Services.Scripts.Logging; using Octopus.Tentacle.Core.Services.Scripts.Security.Masking; using Octopus.Tentacle.Core.Services.Scripts.Shell; using Octopus.Tentacle.Core.Services.Scripts.StateStore; @@ -29,30 +30,12 @@ namespace Octopus.Tentacle.Tests.Integration [TestFixture] public class ScriptServiceV2Fixture { - ScriptServiceV2 service = null!; - ScriptWorkspaceFactory workspaceFactory = null!; - ScriptStateStoreFactory stateStoreFactory = null!; - - [SetUp] - public void SetUp() - { - var homeConfiguration = Substitute.For(); - homeConfiguration.HomeDirectory.Returns(Environment.CurrentDirectory); - - var octopusPhysicalFileSystem = new OctopusPhysicalFileSystem(Substitute.For()); - workspaceFactory = new ScriptWorkspaceFactory(octopusPhysicalFileSystem, homeConfiguration, new SensitiveValueMasker()); - stateStoreFactory = new ScriptStateStoreFactory(octopusPhysicalFileSystem); - service = new ScriptServiceV2( - PlatformDetection.IsRunningOnWindows ? (IShell)new PowerShell() : new Bash(), - workspaceFactory, - stateStoreFactory, - new ScriptIsolationMutex(), - Substitute.For()); - } - [Test] public async Task ShouldExecuteAScriptSuccessfully() { + var builder = new ScriptServiceV2Builder(); + var service = builder.Build(); + var windowsScript = "& ping.exe localhost -n 1"; var bashScript = "ping localhost -c 1"; @@ -63,7 +46,7 @@ public async Task ShouldExecuteAScriptSuccessfully() .Build(); var startScriptResponse = await service.StartScriptAsync(startScriptCommand, CancellationToken.None); - var (logs, finalResponse) = await RunUntilScriptCompletes(startScriptCommand, startScriptResponse); + var (logs, finalResponse) = await RunUntilScriptCompletes(service, startScriptCommand, startScriptResponse); finalResponse.State.Should().Be(ProcessState.Complete); finalResponse.ExitCode.Should().Be(0); @@ -73,6 +56,9 @@ public async Task ShouldExecuteAScriptSuccessfully() [Test] public async Task ShouldReturnANonZeroExitCodeForAFailingScript() { + var builder = new ScriptServiceV2Builder(); + var service = builder.Build(); + var windowsScript = "& ping.exe nope -n 1"; var bashScript = "ping nope -c 1"; @@ -83,7 +69,7 @@ public async Task ShouldReturnANonZeroExitCodeForAFailingScript() .Build(); var startScriptResponse = await service.StartScriptAsync(startScriptCommand, CancellationToken.None); - var (logs, finalResponse) = await RunUntilScriptCompletes(startScriptCommand, startScriptResponse); + var (logs, finalResponse) = await RunUntilScriptCompletes(service, startScriptCommand, startScriptResponse); finalResponse.State.Should().Be(ProcessState.Complete); finalResponse.ExitCode.Should().NotBe(0); @@ -93,6 +79,9 @@ public async Task ShouldReturnANonZeroExitCodeForAFailingScript() [Test] public async Task ShouldExecuteMultipleScriptsConcurrently() { + var builder = new ScriptServiceV2Builder(); + var service = builder.Build(); + var bashScript = "sleep 10"; var windowsScript = "Start-Sleep -Seconds 10"; @@ -114,7 +103,7 @@ public async Task ShouldExecuteMultipleScriptsConcurrently() }); await Task.WhenAll(tasks); - + var startDuration = started.Elapsed; startDuration.Should().BeLessThan(TimeSpan.FromSeconds(5)); @@ -122,7 +111,7 @@ public async Task ShouldExecuteMultipleScriptsConcurrently() foreach (var script in scripts) { - var (logs, finalResponse) = await RunUntilScriptCompletes(script.Command, script.Response); + var (logs, finalResponse) = await RunUntilScriptCompletes(service, script.Command, script.Response); finalResponse.State.Should().Be(ProcessState.Complete); finalResponse.ExitCode.Should().Be(0); @@ -136,6 +125,9 @@ public async Task ShouldExecuteMultipleScriptsConcurrently() [Test] public async Task ShouldStartExecuteAScriptQuickly() { + var builder = new ScriptServiceV2Builder(); + var service = builder.Build(); + var script = "echo \"finished\""; var startScriptCommand = new StartScriptCommandV2Builder() @@ -170,6 +162,9 @@ public async Task ShouldStartExecuteAScriptQuickly() [Test] public async Task ShouldExecuteALongRunningScriptSuccessfully() { + var builder = new ScriptServiceV2Builder(); + var service = builder.Build(); + var bashScript = "sleep 10"; var windowsScript = "Start-Sleep -Seconds 10"; @@ -180,7 +175,7 @@ public async Task ShouldExecuteALongRunningScriptSuccessfully() .Build(); var startScriptResponse = await service.StartScriptAsync(startScriptCommand, CancellationToken.None); - var (_, finalResponse) = await RunUntilScriptCompletes(startScriptCommand, startScriptResponse); + var (_, finalResponse) = await RunUntilScriptCompletes(service, startScriptCommand, startScriptResponse); finalResponse.State.Should().Be(ProcessState.Complete); finalResponse.ExitCode.Should().Be(0); @@ -189,6 +184,9 @@ public async Task ShouldExecuteALongRunningScriptSuccessfully() [Test] public async Task StartScriptShouldWaitForAShortScriptToFinish() { + var builder = new ScriptServiceV2Builder(); + var service = builder.Build(); + var startScriptCommand = new StartScriptCommandV2Builder() .WithScriptBody("echo \"finished\"") .WithDurationStartScriptCanWaitForScriptToFinish(TimeSpan.FromSeconds(5)) @@ -207,6 +205,9 @@ public async Task StartScriptShouldWaitForAShortScriptToFinish() [Test] public async Task StartScriptShouldNotWaitForALongScriptToFinish() { + var builder = new ScriptServiceV2Builder(); + var service = builder.Build(); + var bashScript = "sleep 10"; var windowsScript = "Start-Sleep -Seconds 10"; @@ -217,7 +218,7 @@ public async Task StartScriptShouldNotWaitForALongScriptToFinish() .Build(); var startScriptResponse = await service.StartScriptAsync(startScriptCommand, CancellationToken.None); - await RunUntilScriptCompletes(startScriptCommand, startScriptResponse); + await RunUntilScriptCompletes(service, startScriptCommand, startScriptResponse); startScriptResponse.State.Should().Be(ProcessState.Running); startScriptResponse.ExitCode.Should().Be(0); @@ -227,6 +228,9 @@ public async Task StartScriptShouldNotWaitForALongScriptToFinish() [Test] public async Task StartScriptShouldNotStartTheScriptForTheSameScriptTicketMoreThanOnce_SequentialRequests() { + var builder = new ScriptServiceV2Builder(); + var service = builder.Build(); + // Arrange var scriptTicket = new ScriptTicket(Guid.NewGuid().ToString()); var script1 = GetStartScriptCommandForScriptThatCreatesAFile(scriptTicket); @@ -236,7 +240,7 @@ public async Task StartScriptShouldNotStartTheScriptForTheSameScriptTicketMoreTh await service.StartScriptAsync(script1.StartScriptCommand, CancellationToken.None); var startScriptResponse = await service.StartScriptAsync(script2.StartScriptCommand, CancellationToken.None); - var (_, finalResponse) = await RunUntilScriptCompletes(script2.StartScriptCommand, startScriptResponse); + var (_, finalResponse) = await RunUntilScriptCompletes(service, script2.StartScriptCommand, startScriptResponse); // Assert finalResponse.State.Should().Be(ProcessState.Complete); @@ -249,6 +253,9 @@ public async Task StartScriptShouldNotStartTheScriptForTheSameScriptTicketMoreTh [Test] public async Task StartScriptShouldNotStartTheScriptForTheSameScriptTicketMoreThanOnce_ConcurrentRequests() { + var builder = new ScriptServiceV2Builder(); + var service = builder.Build(); + // Arrange var scriptTicket = new ScriptTicket(Guid.NewGuid().ToString()); @@ -271,6 +278,7 @@ public async Task StartScriptShouldNotStartTheScriptForTheSameScriptTicketMoreTh await Task.WhenAll(tasks); var (_, finalResponse) = await RunUntilScriptCompletes( + service, scripts[0].StartScriptCommand, new ScriptStatusResponseV2(scripts[0].StartScriptCommand.ScriptTicket, ProcessState.Pending, 0, new List(), 0)); @@ -292,6 +300,9 @@ public async Task StartScriptShouldNotStartTheScriptForTheSameScriptTicketMoreTh [Test] public async Task CancelScriptShouldCancelAnExecutingScript() { + var builder = new ScriptServiceV2Builder(); + var service = builder.Build(); + var bashScript = "sleep 60"; var windowsScript = "Start-Sleep -Seconds 60"; @@ -337,6 +348,10 @@ public async Task CancelScriptShouldCancelAnExecutingScript() [Test] public async Task CompleteScriptShouldCleanupTheWorkspace() { + var builder = new ScriptServiceV2Builder(); + var service = builder.Build(); + var workspaceFactory = (ScriptWorkspaceFactory)builder.WorkspaceFactory; + var script = "echo \"finished\""; var startScriptCommand = new StartScriptCommandV2Builder() @@ -351,13 +366,14 @@ public async Task CompleteScriptShouldCleanupTheWorkspace() var startScriptResponse = await service.StartScriptAsync(startScriptCommand, CancellationToken.None); Directory.Exists(workspaceDirectory).Should().BeTrue(); - await RunUntilScriptCompletes(startScriptCommand, startScriptResponse); + await RunUntilScriptCompletes(service, startScriptCommand, startScriptResponse); Directory.Exists(workspaceDirectory).Should().BeFalse(); } [Test] public async Task GetStatusShouldReturnAnExitCodeOf45ForAnUnknownScriptTicket() { + var service = new ScriptServiceV2Builder().Build(); var response = await service.GetStatusAsync(new ScriptStatusRequestV2(new ScriptTicket("nope"), 0), CancellationToken.None); response.ExitCode.Should().Be(-45); @@ -366,6 +382,7 @@ public async Task GetStatusShouldReturnAnExitCodeOf45ForAnUnknownScriptTicket() [Test] public async Task CancelScriptShouldReturnAnExitCodeOf45ForAnUnknownScriptTicket() { + var service = new ScriptServiceV2Builder().Build(); var response = await service.CancelScriptAsync(new CancelScriptCommandV2(new ScriptTicket("nope"), 0), CancellationToken.None); response.ExitCode.Should().Be(-45); @@ -374,43 +391,53 @@ public async Task CancelScriptShouldReturnAnExitCodeOf45ForAnUnknownScriptTicket [Test] public async Task CompleteScriptShouldNotErrorForAnUnknownScriptTicket() { + var service = new ScriptServiceV2Builder().Build(); await service.CompleteScriptAsync(new CompleteScriptCommandV2(new ScriptTicket("nope")), CancellationToken.None); } [Test] public async Task GetStatusShouldReturnAnExitCodeOf46ForAScriptWithAnUnknownResult() { + var builder = new ScriptServiceV2Builder(); + var service = builder.Build(); + var request = new ScriptStatusRequestV2(new ScriptTicket($"did-not-finish-{Guid.NewGuid()}"), 0); var ticket = request.Ticket; - SetupScriptState(ticket); + SetupScriptState(builder.WorkspaceFactory, builder.StateStoreFactory, ticket); var response = await service.GetStatusAsync(request, CancellationToken.None); response.ExitCode.Should().Be(-46); - await CleanupWorkspace(ticket, CancellationToken.None); + await CleanupWorkspace(builder.WorkspaceFactory, ticket, CancellationToken.None); } [Test] public async Task CancelScriptShouldReturnAnExitCodeOf46ForAScriptWithAnUnknownResult() { + var builder = new ScriptServiceV2Builder(); + var service = builder.Build(); + var request = new CancelScriptCommandV2(new ScriptTicket($"did-not-finish-{Guid.NewGuid()}"), 0); var ticket = request.Ticket; - SetupScriptState(ticket); + SetupScriptState(builder.WorkspaceFactory, builder.StateStoreFactory, ticket); var response = await service.CancelScriptAsync(request, CancellationToken.None); response.ExitCode.Should().Be(-46); - await CleanupWorkspace(ticket, CancellationToken.None); + await CleanupWorkspace(builder.WorkspaceFactory, ticket, CancellationToken.None); } [Test] public async Task CompleteScriptShouldNotErrorForAScriptWithAnUnknownResult() { + var builder = new ScriptServiceV2Builder(); + var service = builder.Build(); + var request = new CompleteScriptCommandV2(new ScriptTicket($"did-not-finish-{Guid.NewGuid()}")); var ticket = request.Ticket; - SetupScriptState(ticket); + SetupScriptState(builder.WorkspaceFactory, builder.StateStoreFactory, ticket); await service.CompleteScriptAsync(request, CancellationToken.None); } @@ -418,6 +445,9 @@ public async Task CompleteScriptShouldNotErrorForAScriptWithAnUnknownResult() [Test] public async Task ShouldStoreTheStateOfTheScriptInTheScriptStateStore() { + var builder = new ScriptServiceV2Builder(); + var service = builder.Build(); + var testStarted = DateTimeOffset.UtcNow; var bashScript = "sleep 10"; @@ -429,13 +459,13 @@ public async Task ShouldStoreTheStateOfTheScriptInTheScriptStateStore() .WithDurationStartScriptCanWaitForScriptToFinish(null) .Build(); - var scriptStateStore = SetupScriptStateStore(startScriptCommand.ScriptTicket); + var scriptStateStore = SetupScriptStateStore(builder.WorkspaceFactory, builder.StateStoreFactory, startScriptCommand.ScriptTicket); var startScriptResponse = await service.StartScriptAsync(startScriptCommand, CancellationToken.None); var runningScriptState = scriptStateStore.Load(); - var (logs, finalResponse) = await RunUntilScriptFinishes(startScriptCommand, startScriptResponse); + var (logs, finalResponse) = await RunUntilScriptFinishes(service, startScriptCommand, startScriptResponse); var testFinished = DateTimeOffset.UtcNow; var finishedScriptState = scriptStateStore.Load(); @@ -461,6 +491,8 @@ public async Task ShouldStoreTheStateOfTheScriptInTheScriptStateStore() [Test] public async Task ScriptTicketCasingShouldNotAffectCommands() { + var service = new ScriptServiceV2Builder().Build(); + // Arrange var startScriptCommand = new StartScriptCommandV2Builder() .WithScriptBody("echo \"finished\"") @@ -483,22 +515,364 @@ public async Task ScriptTicketCasingShouldNotAffectCommands() lowerCaseResponse.ExitCode.Should().Be(0); } + [Test] + public async Task AbandonScript_OnUnknownTicket_ReturnsCompleteWithUnknownScriptExitCode() + { + var service = new ScriptServiceV2Builder().Build(); + + var ticket = new ScriptTicket("unknown-ticket-" + Guid.NewGuid().ToString("N")); + var response = await service.AbandonScriptAsync(new AbandonScriptCommandV2(ticket, 0), CancellationToken.None); + + response.State.Should().Be(ProcessState.Complete); + response.ExitCode.Should().Be(ScriptExitCodes.UnknownScriptExitCode); + } + + [Test] + public async Task AbandonScript_OnRunningScript_FiresAbandonToken_ReturnsAbandonedExitCode() + { + var service = new ScriptServiceV2Builder().Build(); + + var startCommand = new StartScriptCommandV2Builder() + .WithScriptBodyForCurrentOs("Start-Sleep -Seconds 60", "sleep 60") + .WithIsolation(ScriptIsolationLevel.NoIsolation) + .WithDurationStartScriptCanWaitForScriptToFinish(null) + .Build(); + + await service.StartScriptAsync(startCommand, CancellationToken.None); + + // Wait for the script to reach Running state + ScriptStatusResponseV2 status; + var deadline = DateTime.UtcNow.AddSeconds(30); + do + { + status = await service.GetStatusAsync(new ScriptStatusRequestV2(startCommand.ScriptTicket, 0), CancellationToken.None); + if (status.State == ProcessState.Running) break; + await Task.Delay(50); + } while (DateTime.UtcNow < deadline); + status.State.Should().Be(ProcessState.Running, "script should have reached Running state within 30 seconds"); + + // Fire abandon + await service.AbandonScriptAsync(new AbandonScriptCommandV2(startCommand.ScriptTicket, 0), CancellationToken.None); + + // Poll until the script completes (the abandon token causes the process runner to return AbandonedExitCode) + ScriptStatusResponseV2 finalResponse; + var completionDeadline = DateTime.UtcNow.AddSeconds(30); + do + { + finalResponse = await service.GetStatusAsync(new ScriptStatusRequestV2(startCommand.ScriptTicket, 0), CancellationToken.None); + if (finalResponse.State == ProcessState.Complete) break; + await Task.Delay(100); + } while (DateTime.UtcNow < completionDeadline); + + finalResponse.State.Should().Be(ProcessState.Complete); + finalResponse.ExitCode.Should().Be(ScriptExitCodes.AbandonedExitCode); + } + + [Test] + public async Task AbandonScript_OnAlreadyCompletedScript_ReturnsRealExitCode() + { + var service = new ScriptServiceV2Builder().Build(); + + var startCommand = new StartScriptCommandV2Builder() + .WithScriptBody("echo \"finished\"") + .WithIsolation(ScriptIsolationLevel.NoIsolation) + .WithDurationStartScriptCanWaitForScriptToFinish(null) + .Build(); + + await service.StartScriptAsync(startCommand, CancellationToken.None); + + // Wait for the script to complete + ScriptStatusResponseV2 status; + var deadline = DateTime.UtcNow.AddSeconds(30); + do + { + status = await service.GetStatusAsync(new ScriptStatusRequestV2(startCommand.ScriptTicket, 0), CancellationToken.None); + if (status.State == ProcessState.Complete) break; + await Task.Delay(50); + } while (DateTime.UtcNow < deadline); + status.State.Should().Be(ProcessState.Complete); + + var abandonResponse = await service.AbandonScriptAsync(new AbandonScriptCommandV2(startCommand.ScriptTicket, 0), CancellationToken.None); + abandonResponse.ExitCode.Should().Be(0, "real exit code should be returned, not AbandonedExitCode"); + } + + [Test] + public async Task CompleteScript_AfterAbandon_WhenWorkspaceDeleteFails_LogsWarnAndReturnsNormally() + { + var deleteException = new IOException("file in use"); + var builder = new ScriptServiceV2Builder(); + var (throwingFactory, mockLog) = BuildFactoryWithThrowingDelete(builder.WorkspaceFactory, deleteException); + var serviceUnderTest = builder + .WithWorkspaceFactory(throwingFactory) + .WithLog(mockLog) + .Build(); + + var startCommand = new StartScriptCommandV2Builder() + .WithScriptBodyForCurrentOs("Start-Sleep -Seconds 60", "sleep 60") + .WithIsolation(ScriptIsolationLevel.NoIsolation) + .WithDurationStartScriptCanWaitForScriptToFinish(null) + .Build(); + + await serviceUnderTest.StartScriptAsync(startCommand, CancellationToken.None); + + // Wait for Running + ScriptStatusResponseV2 status; + var runningDeadline = DateTime.UtcNow.AddSeconds(30); + do + { + status = await serviceUnderTest.GetStatusAsync(new ScriptStatusRequestV2(startCommand.ScriptTicket, 0), CancellationToken.None); + if (status.State == ProcessState.Running) break; + await Task.Delay(50); + } while (DateTime.UtcNow < runningDeadline); + status.State.Should().Be(ProcessState.Running, "script should have reached Running state within 30 seconds"); + + await serviceUnderTest.AbandonScriptAsync(new AbandonScriptCommandV2(startCommand.ScriptTicket, 0), CancellationToken.None); + + // Poll until Complete + var completeDeadline = DateTime.UtcNow.AddSeconds(30); + do + { + status = await serviceUnderTest.GetStatusAsync(new ScriptStatusRequestV2(startCommand.ScriptTicket, 0), CancellationToken.None); + if (status.State == ProcessState.Complete) break; + await Task.Delay(50); + } while (DateTime.UtcNow < completeDeadline); + status.State.Should().Be(ProcessState.Complete); + status.ExitCode.Should().Be(ScriptExitCodes.AbandonedExitCode); + + Func complete = async () => await serviceUnderTest.CompleteScriptAsync(new CompleteScriptCommandV2(startCommand.ScriptTicket), CancellationToken.None); + + await complete.Should().NotThrowAsync(); + mockLog.Received().Warn(deleteException, Arg.Is(m => m.Contains("Could not delete") && m.Contains(startCommand.ScriptTicket.TaskId))); + } + + [Test] + public async Task CompleteScript_AfterNormalCompletion_WhenWorkspaceDeleteFails_PropagatesException() + { + var deleteException = new IOException("file in use"); + var builder = new ScriptServiceV2Builder(); + var (throwingFactory, mockLog) = BuildFactoryWithThrowingDelete(builder.WorkspaceFactory, deleteException); + var serviceUnderTest = builder + .WithWorkspaceFactory(throwingFactory) + .WithLog(mockLog) + .Build(); + + var startCommand = new StartScriptCommandV2Builder() + .WithScriptBody("echo \"finished\"") + .WithIsolation(ScriptIsolationLevel.NoIsolation) + .WithDurationStartScriptCanWaitForScriptToFinish(null) + .Build(); + + await serviceUnderTest.StartScriptAsync(startCommand, CancellationToken.None); + + // Poll until natural completion + ScriptStatusResponseV2 status; + var deadline = DateTime.UtcNow.AddSeconds(30); + do + { + status = await serviceUnderTest.GetStatusAsync(new ScriptStatusRequestV2(startCommand.ScriptTicket, 0), CancellationToken.None); + if (status.State == ProcessState.Complete) break; + await Task.Delay(50); + } while (DateTime.UtcNow < deadline); + status.State.Should().Be(ProcessState.Complete); + status.ExitCode.Should().Be(0, "the script exited cleanly, not via abandon"); + + Func complete = async () => await serviceUnderTest.CompleteScriptAsync(new CompleteScriptCommandV2(startCommand.ScriptTicket), CancellationToken.None); + + await complete.Should().ThrowAsync(); + } + + /// + /// Builds an IScriptWorkspaceFactory decorator over the supplied workspaceFactory whose returned + /// workspaces forward every member except Delete(CancellationToken), which throws the supplied + /// exception. Also returns a mock ISystemLog for assertion. + /// + static (IScriptWorkspaceFactory factory, ISystemLog log) BuildFactoryWithThrowingDelete(IScriptWorkspaceFactory workspaceFactory, Exception deleteException) + { + var throwingFactory = new DeleteThrowingScriptWorkspaceFactory(workspaceFactory, deleteException); + var fakeLog = Substitute.For(); + return (throwingFactory, fakeLog); + } + + /// + /// Builder for ScriptServiceV2 SUT construction. Defaults match what the previous [SetUp] + /// produced; tests opt into mock overrides via the chainable With* methods. The + /// WorkspaceFactory and StateStoreFactory properties materialize on first access so tests + /// can grab them before Build() (e.g. to wrap the default factory in a decorator). + /// + class ScriptServiceV2Builder + { + IScriptWorkspaceFactory? workspaceFactory; + ScriptStateStoreFactory? stateStoreFactory; + ISystemLog? log; + IShell? shell; + ScriptIsolationMutex? mutex; + OctopusPhysicalFileSystem? cachedFileSystem; + + public IScriptWorkspaceFactory WorkspaceFactory => workspaceFactory ??= BuildDefaultWorkspaceFactory(); + public ScriptStateStoreFactory StateStoreFactory => stateStoreFactory ??= new ScriptStateStoreFactory(FileSystem); + + OctopusPhysicalFileSystem FileSystem => cachedFileSystem ??= new OctopusPhysicalFileSystem(Substitute.For()); + + ScriptWorkspaceFactory BuildDefaultWorkspaceFactory() + { + var homeConfiguration = Substitute.For(); + homeConfiguration.HomeDirectory.Returns(Environment.CurrentDirectory); + return new ScriptWorkspaceFactory(FileSystem, homeConfiguration, new SensitiveValueMasker()); + } + + public ScriptServiceV2Builder WithWorkspaceFactory(IScriptWorkspaceFactory factory) + { + workspaceFactory = factory; + return this; + } + + public ScriptServiceV2Builder WithStateStoreFactory(ScriptStateStoreFactory factory) + { + stateStoreFactory = factory; + return this; + } + + public ScriptServiceV2Builder WithLog(ISystemLog log) + { + this.log = log; + return this; + } + + public ScriptServiceV2Builder WithShell(IShell shell) + { + this.shell = shell; + return this; + } + + public ScriptServiceV2Builder WithMutex(ScriptIsolationMutex mutex) + { + this.mutex = mutex; + return this; + } + + public ScriptServiceV2 Build() + { + return new ScriptServiceV2( + shell ?? (PlatformDetection.IsRunningOnWindows ? (IShell)new PowerShell() : new Bash()), + WorkspaceFactory, + StateStoreFactory, + mutex ?? new ScriptIsolationMutex(), + log ?? Substitute.For()); + } + } + + /// + /// IScriptWorkspaceFactory decorator that wraps every workspace it returns in a + /// DeleteThrowingScriptWorkspace so that Delete throws the configured exception while all + /// other members forward to the real workspace. + /// + class DeleteThrowingScriptWorkspaceFactory : IScriptWorkspaceFactory + { + readonly IScriptWorkspaceFactory inner; + readonly Exception deleteException; + + public DeleteThrowingScriptWorkspaceFactory(IScriptWorkspaceFactory inner, Exception deleteException) + { + this.inner = inner; + this.deleteException = deleteException; + } + + public IScriptWorkspace GetWorkspace(ScriptTicket ticket, WorkspaceReadinessCheck readinessCheck) + => new DeleteThrowingScriptWorkspace(inner.GetWorkspace(ticket, readinessCheck), deleteException); + + public async Task PrepareWorkspace( + ScriptTicket ticket, + string scriptBody, + Dictionary scripts, + ScriptIsolationLevel isolationLevel, + TimeSpan scriptMutexAcquireTimeout, + string? scriptMutexName, + string[]? scriptArguments, + List files, + CancellationToken cancellationToken) + { + var workspace = await inner.PrepareWorkspace(ticket, scriptBody, scripts, isolationLevel, scriptMutexAcquireTimeout, scriptMutexName, scriptArguments, files, cancellationToken); + return new DeleteThrowingScriptWorkspace(workspace, deleteException); + } + + public List GetUncompletedWorkspaces() + => inner.GetUncompletedWorkspaces().Select(w => (IScriptWorkspace)new DeleteThrowingScriptWorkspace(w, deleteException)).ToList(); + } + + /// + /// IScriptWorkspace decorator that forwards every member to an inner real workspace, except + /// Delete(CancellationToken), which throws the configured exception. Used to exercise the + /// CompleteScript abandon-aware tolerance of Delete failures without disturbing anything else + /// StartScript / RunningScript may touch on the workspace. + /// + class DeleteThrowingScriptWorkspace : IScriptWorkspace + { + readonly IScriptWorkspace inner; + readonly Exception deleteException; + + public DeleteThrowingScriptWorkspace(IScriptWorkspace inner, Exception deleteException) + { + this.inner = inner; + this.deleteException = deleteException; + } + + public ScriptTicket ScriptTicket => inner.ScriptTicket; + public string WorkingDirectory => inner.WorkingDirectory; + public string BootstrapScriptFilePath => inner.BootstrapScriptFilePath; + public string LogFilePath => inner.LogFilePath; + + public string[]? ScriptArguments + { + get => inner.ScriptArguments; + set => inner.ScriptArguments = value; + } + + public ScriptIsolationLevel IsolationLevel + { + get => inner.IsolationLevel; + set => inner.IsolationLevel = value; + } + + public TimeSpan ScriptMutexAcquireTimeout + { + get => inner.ScriptMutexAcquireTimeout; + set => inner.ScriptMutexAcquireTimeout = value; + } + + public string? ScriptMutexName + { + get => inner.ScriptMutexName; + set => inner.ScriptMutexName = value; + } + + public bool ShouldMonitorPowerShellStartup() => inner.ShouldMonitorPowerShellStartup(); + public void BootstrapScript(string scriptBody) => inner.BootstrapScript(scriptBody); + public string ResolvePath(string fileName) => inner.ResolvePath(fileName); + public IScriptLog CreateLog() => inner.CreateLog(); + public void WriteFile(string filename, string contents) => inner.WriteFile(filename, contents); + public void CopyFile(string sourceFilePath, string destFileName, bool overwrite) => inner.CopyFile(sourceFilePath, destFileName, overwrite); + public void CheckReadiness() => inner.CheckReadiness(); + public string? TryReadFile(string filename) => inner.TryReadFile(filename); + + public Task Delete(CancellationToken cancellationToken) => throw deleteException; + } + // TODO - Test the stateStore is updated. - private void SetupScriptState(ScriptTicket ticket) + static void SetupScriptState(IScriptWorkspaceFactory workspaceFactory, ScriptStateStoreFactory stateStoreFactory, ScriptTicket ticket) { - var stateWorkspace = SetupScriptStateStore(ticket); + var stateWorkspace = SetupScriptStateStore(workspaceFactory, stateStoreFactory, ticket); stateWorkspace.Create(); } - private ScriptStateStore SetupScriptStateStore(ScriptTicket ticket) + static ScriptStateStore SetupScriptStateStore(IScriptWorkspaceFactory workspaceFactory, ScriptStateStoreFactory stateStoreFactory, ScriptTicket ticket) { var workspace = workspaceFactory.GetWorkspace(ticket, WorkspaceReadinessCheck.Perform); var stateWorkspace = stateStoreFactory.Create(workspace); return stateWorkspace; } - private async Task CleanupWorkspace(ScriptTicket ticket, CancellationToken cancellationToken) + static async Task CleanupWorkspace(IScriptWorkspaceFactory workspaceFactory, ScriptTicket ticket, CancellationToken cancellationToken) { var workspace = workspaceFactory.GetWorkspace(ticket, WorkspaceReadinessCheck.Skip); await workspace.Delete(cancellationToken); @@ -527,9 +901,9 @@ private async Task CleanupWorkspace(ScriptTicket ticket, CancellationToken cance return (startScriptCommand, new FileInfo(filePath)); } - async Task<(List, ScriptStatusResponseV2)> RunUntilScriptCompletes(StartScriptCommandV2 startScriptCommand, ScriptStatusResponseV2 response) + static async Task<(List, ScriptStatusResponseV2)> RunUntilScriptCompletes(ScriptServiceV2 service, StartScriptCommandV2 startScriptCommand, ScriptStatusResponseV2 response) { - var (logs, lastResponse) = await RunUntilScriptFinishes(startScriptCommand, response); + var (logs, lastResponse) = await RunUntilScriptFinishes(service, startScriptCommand, response); await service.CompleteScriptAsync(new CompleteScriptCommandV2(startScriptCommand.ScriptTicket), CancellationToken.None); @@ -538,7 +912,7 @@ private async Task CleanupWorkspace(ScriptTicket ticket, CancellationToken cance return (logs, lastResponse); } - async Task<(List logs, ScriptStatusResponseV2 response)> RunUntilScriptFinishes(StartScriptCommandV2 startScriptCommand, ScriptStatusResponseV2 response) + static async Task<(List logs, ScriptStatusResponseV2 response)> RunUntilScriptFinishes(ScriptServiceV2 service, StartScriptCommandV2 startScriptCommand, ScriptStatusResponseV2 response) { var logs = new List(response.Logs); @@ -557,7 +931,7 @@ private async Task CleanupWorkspace(ScriptTicket ticket, CancellationToken cance return (logs, response); } - void WriteLogsToConsole(List logs) + static void WriteLogsToConsole(List logs) { foreach (var log in logs) { @@ -576,4 +950,4 @@ public StartScriptCommandAndResponse(StartScriptCommandV2 command) public ScriptStatusResponseV2? Response { get; set; } } } -} \ No newline at end of file +} diff --git a/source/Octopus.Tentacle.Tests/Kubernetes/KubernetesDirectoryInformationProviderFixture.cs b/source/Octopus.Tentacle.Tests/Kubernetes/KubernetesDirectoryInformationProviderFixture.cs index 60799927e..b4b7e337e 100644 --- a/source/Octopus.Tentacle.Tests/Kubernetes/KubernetesDirectoryInformationProviderFixture.cs +++ b/source/Octopus.Tentacle.Tests/Kubernetes/KubernetesDirectoryInformationProviderFixture.cs @@ -1,6 +1,7 @@ using System; using System.Collections.Generic; using System.Threading; +using System.Threading.Tasks; using FluentAssertions; using Microsoft.Extensions.Caching.Memory; using Microsoft.Extensions.Internal; @@ -24,27 +25,29 @@ public void DuOutputParses() { const ulong usedSize = 500 * Megabyte; var spr = Substitute.For(); - spr.When(x => x.ExecuteCommand("du", "-s -B 1 /octopus", "/", Arg.Any>(), Arg.Any>())) - .Do(x => + spr.ExecuteCommandAsync("du", "-s -B 1 /octopus", "/", Arg.Any>(), Arg.Any>()) + .Returns(callInfo => { - x.ArgAt>(3).Invoke($"{usedSize}\t/octopus"); + callInfo.ArgAt>(3).Invoke($"{usedSize}\t/octopus"); + return Task.FromResult(0); }); var memoryCache = new MemoryCache(new MemoryCacheOptions()); var sut = new KubernetesDirectoryInformationProvider(Substitute.For(), spr, memoryCache); sut.GetPathUsedBytes("/octopus").Should().Be(usedSize); } - + [Test] public void DuOutputParsesWithMultipleLines() { const ulong usedSize = 500 * Megabyte; var spr = Substitute.For(); - spr.When(x => x.ExecuteCommand("du", "-s -B 1 /octopus", "/", Arg.Any>(), Arg.Any>())) - .Do(x => + spr.ExecuteCommandAsync("du", "-s -B 1 /octopus", "/", Arg.Any>(), Arg.Any>()) + .Returns(callInfo => { - x.ArgAt>(3).Invoke($"500\t/octopus/extradir"); - x.ArgAt>(3).Invoke($"{usedSize}\t/octopus"); - x.ArgAt>(3).Invoke($"{usedSize+1000}\tTotal"); + callInfo.ArgAt>(3).Invoke($"500\t/octopus/extradir"); + callInfo.ArgAt>(3).Invoke($"{usedSize}\t/octopus"); + callInfo.ArgAt>(3).Invoke($"{usedSize+1000}\tTotal"); + return Task.FromResult(0); }); var memoryCache = new MemoryCache(new MemoryCacheOptions()); var sut = new KubernetesDirectoryInformationProvider(Substitute.For(), spr, memoryCache); @@ -56,18 +59,18 @@ public void IfDuFailsWeStillGetData() { const ulong usedSize = 500 * Megabyte; var spr = Substitute.For(); - spr.When(x => x.ExecuteCommand("du", "-s -B 1 /octopus", "/", Arg.Any>(), Arg.Any>())) - .Do(x => + spr.ExecuteCommandAsync("du", "-s -B 1 /octopus", "/", Arg.Any>(), Arg.Any>()) + .Returns(callInfo => { - x.ArgAt>(3).Invoke($"500\t/octopus"); - x.ArgAt>(3).Invoke($"{usedSize}\t/octopus"); + callInfo.ArgAt>(3).Invoke($"500\t/octopus"); + callInfo.ArgAt>(3).Invoke($"{usedSize}\t/octopus"); + return Task.FromResult(1); }); - spr.ReturnsForAll(1); var memoryCache = new MemoryCache(new MemoryCacheOptions()); var sut = new KubernetesDirectoryInformationProvider(Substitute.For(), spr, memoryCache); sut.GetPathUsedBytes("/octopus").Should().Be(usedSize); } - + [Test] public void IfDuFailsWeLogCorrectly() { @@ -75,22 +78,22 @@ public void IfDuFailsWeLogCorrectly() var systemLog = new InMemoryLog(); var spr = Substitute.For(); - spr.When(x => x.ExecuteCommand("du", "-s -B 1 /octopus", "/", Arg.Any>(), Arg.Any>())) - .Do(x => + spr.ExecuteCommandAsync("du", "-s -B 1 /octopus", "/", Arg.Any>(), Arg.Any>()) + .Returns(callInfo => { // stdout - x.ArgAt>(3).Invoke("500\t/octopus"); - x.ArgAt>(3).Invoke($"{usedSize}\t/octopus"); - + callInfo.ArgAt>(3).Invoke("500\t/octopus"); + callInfo.ArgAt>(3).Invoke($"{usedSize}\t/octopus"); + // stderr - x.ArgAt>(4).Invoke("no permission for foo"); - x.ArgAt>(4).Invoke("also no permission for bar"); + callInfo.ArgAt>(4).Invoke("no permission for foo"); + callInfo.ArgAt>(4).Invoke("also no permission for bar"); + return Task.FromResult(1); }); - spr.ReturnsForAll(1); var memoryCache = new MemoryCache(new MemoryCacheOptions()); var sut = new KubernetesDirectoryInformationProvider(systemLog, spr, memoryCache); sut.GetPathUsedBytes("/octopus").Should().Be(usedSize); - + systemLog.GetLogsForCategory(LogCategory.Warning).Should().Contain("Could not reliably get disk space using du. Getting best approximation..."); systemLog.GetLogsForCategory(LogCategory.Info).Should().Contain($"Du stdout returned 500\t/octopus, {usedSize}\t/octopus"); systemLog.GetLogsForCategory(LogCategory.Info).Should().Contain("Du stderr returned no permission for foo, also no permission for bar"); @@ -100,56 +103,61 @@ public void IfDuFailsWeLogCorrectly() public void IfDuFailsCompletelyReturnNull() { var spr = Substitute.For(); - spr.ReturnsForAll(1); + spr.ExecuteCommandAsync(Arg.Any(), Arg.Any(), Arg.Any(), Arg.Any>(), Arg.Any>()) + .Returns(Task.FromResult(1)); var memoryCache = new MemoryCache(new MemoryCacheOptions()); var sut = new KubernetesDirectoryInformationProvider(Substitute.For(), spr, memoryCache); sut.GetPathUsedBytes("/octopus").Should().Be(null); } - + [Test] public void ReturnedValueShouldBeCached() { var spr = Substitute.For(); - spr.ReturnsForAll(1); + spr.ExecuteCommandAsync(Arg.Any(), Arg.Any(), Arg.Any(), Arg.Any>(), Arg.Any>()) + .Returns(Task.FromResult(1)); var baseTime = DateTimeOffset.UtcNow; var clock = new TestClock(baseTime); var memoryCache = new MemoryCache(new MemoryCacheOptions(){ Clock = clock}); var sut = new KubernetesDirectoryInformationProvider(Substitute.For(), spr, memoryCache); sut.GetPathUsedBytes("/octopus").Should().Be(null); - + const ulong usedSize = 500 * Megabyte; - spr.When(x => x.ExecuteCommand("du", "-s -B 1 /octopus", "/", Arg.Any>(), Arg.Any>())) - .Do(x => + spr.ExecuteCommandAsync("du", "-s -B 1 /octopus", "/", Arg.Any>(), Arg.Any>()) + .Returns(callInfo => { - x.ArgAt>(3).Invoke($"123\t/octopus"); - x.ArgAt>(3).Invoke($"{usedSize}\t/octopus"); + callInfo.ArgAt>(3).Invoke($"123\t/octopus"); + callInfo.ArgAt>(3).Invoke($"{usedSize}\t/octopus"); + return Task.FromResult(0); }); clock.UtcNow = baseTime + TimeSpan.FromSeconds(29); sut.GetPathUsedBytes("/octopus").Should().Be(null); } - + [Test] public void DuCacheExpiresAfter30Seconds() { var spr = Substitute.For(); - spr.ReturnsForAll(1); + spr.ExecuteCommandAsync(Arg.Any(), Arg.Any(), Arg.Any(), Arg.Any>(), Arg.Any>()) + .Returns(Task.FromResult(1)); var baseTime = DateTimeOffset.UtcNow; var clock = new TestClock(baseTime); var memoryCache = new MemoryCache(new MemoryCacheOptions(){ Clock = clock}); var sut = new KubernetesDirectoryInformationProvider(Substitute.For(), spr, memoryCache); sut.GetPathUsedBytes("/octopus").Should().Be(null); - + const ulong usedSize = 500 * Megabyte; - spr.When(x => x.ExecuteCommand("du", "-s -B 1 /octopus", "/", Arg.Any>(), Arg.Any>())) - .Do(x => + spr.ExecuteCommandAsync("du", "-s -B 1 /octopus", "/", Arg.Any>(), Arg.Any>()) + .Returns(callInfo => { - x.ArgAt>(3).Invoke($"123\t/octopus"); - x.ArgAt>(3).Invoke($"{usedSize}\t/octopus"); + callInfo.ArgAt>(3).Invoke($"123\t/octopus"); + callInfo.ArgAt>(3).Invoke($"{usedSize}\t/octopus"); + return Task.FromResult(0); }); clock.UtcNow = baseTime + TimeSpan.FromSeconds(30); - + sut.GetPathUsedBytes("/octopus").Should().Be(usedSize); } diff --git a/source/Octopus.Tentacle.Tests/Util/LinuxTestUserPrincipal.cs b/source/Octopus.Tentacle.Tests/Util/LinuxTestUserPrincipal.cs index 83da2cb93..219fa5b6a 100644 --- a/source/Octopus.Tentacle.Tests/Util/LinuxTestUserPrincipal.cs +++ b/source/Octopus.Tentacle.Tests/Util/LinuxTestUserPrincipal.cs @@ -19,10 +19,19 @@ public LinuxTestUserPrincipal(string username) public string UserName { get; } + // Why this is sync: RunCommand is called from the constructor, which can't + // be async. + // + // Why blocking on the async call is safe: this only runs under NUnit, which + // dispatches us on a worker thread with no SynchronizationContext. + // + // Why low risk: this is test code. The worst case for a wrong call here is + // a hung test, not a production incident. + // See https://blog.stephencleary.com/2012/07/dont-block-on-async-code.html static void RunCommand(string arguments, bool failOnNonZeroExitCode = true) { var commandLineInvocation = new CommandLineInvocation("/bin/bash", arguments); - var result = commandLineInvocation.ExecuteCommand(); + var result = commandLineInvocation.ExecuteCommandAsync().GetAwaiter().GetResult(); foreach (var line in result.Errors) Console.WriteLine(line); diff --git a/source/Octopus.Tentacle/Kubernetes/KubernetesDirectoryInformationProvider.cs b/source/Octopus.Tentacle/Kubernetes/KubernetesDirectoryInformationProvider.cs index 9154bdfff..7609ff106 100644 --- a/source/Octopus.Tentacle/Kubernetes/KubernetesDirectoryInformationProvider.cs +++ b/source/Octopus.Tentacle/Kubernetes/KubernetesDirectoryInformationProvider.cs @@ -1,6 +1,7 @@ using System; using System.Collections.Generic; using System.Linq; +using System.Threading.Tasks; using Octopus.Client.Extensions; using Microsoft.Extensions.Caching.Memory; using Octopus.Tentacle.Core.Diagnostics; @@ -11,15 +12,16 @@ namespace Octopus.Tentacle.Kubernetes public interface IKubernetesDirectoryInformationProvider { public ulong? GetPathUsedBytes(string directoryPath); + public Task GetPathUsedBytesAsync(string directoryPath); public ulong? GetPathTotalBytes(); } - + public class KubernetesDirectoryInformationProvider : IKubernetesDirectoryInformationProvider { readonly ISystemLog log; readonly ISilentProcessRunner silentProcessRunner; readonly IMemoryCache directoryInformationCache; - + //30s gives us fairly up to date information, but doesn't impact performance too much. //For 50 concurrent Cloud deployments: //No cache: 30min ea. @@ -36,12 +38,25 @@ public KubernetesDirectoryInformationProvider(ISystemLog log, ISilentProcessRunn this.directoryInformationCache = directoryInformationCache; } + // Why this is sync: the only caller is EnsureDiskHasEnoughFreeSpace, which + // overrides a sync method on the IOctopusFileSystem chain. Making that + // whole chain async is a wider refactor than this PR. New code should + // call GetPathUsedBytesAsync directly instead of going through here. + // + // Why blocking on the async call is safe: .GetAwaiter().GetResult() can + // deadlock when the calling thread has a SynchronizationContext. The + // Kubernetes agent is a console app and doesn't set one up, so there's + // nothing for the awaited continuation to wait on. + // See https://blog.stephencleary.com/2012/07/dont-block-on-async-code.html public ulong? GetPathUsedBytes(string directoryPath) + => GetPathUsedBytesAsync(directoryPath).GetAwaiter().GetResult(); + + public async Task GetPathUsedBytesAsync(string directoryPath) { - return directoryInformationCache.GetOrCreate(directoryPath, e => + return await directoryInformationCache.GetOrCreateAsync(directoryPath, async e => { e.SetAbsoluteExpiration(CacheExpiry); - return GetDriveBytesUsingDu(directoryPath); + return await GetDriveBytesUsingDuAsync(directoryPath); }); } @@ -49,13 +64,13 @@ public KubernetesDirectoryInformationProvider(ISystemLog log, ISilentProcessRunn { return KubernetesUtilities.GetResourceBytes(KubernetesConfig.PersistentVolumeSize); } - - - ulong? GetDriveBytesUsingDu(string directoryPath) + + + async Task GetDriveBytesUsingDuAsync(string directoryPath) { var stdOut = new List(); var stdErr = new List(); - var exitCode = silentProcessRunner.ExecuteCommand("du", $"-s -B 1 {directoryPath}", "/", stdOut.Add, stdErr.Add); + var exitCode = await silentProcessRunner.ExecuteCommandAsync("du", $"-s -B 1 {directoryPath}", "/", stdOut.Add, stdErr.Add); if (exitCode != 0) { @@ -71,4 +86,4 @@ public KubernetesDirectoryInformationProvider(ISystemLog log, ISilentProcessRunn return null; } } -} \ No newline at end of file +} diff --git a/source/Octopus.Tentacle/Kubernetes/KubernetesPhysicalFileSystem.cs b/source/Octopus.Tentacle/Kubernetes/KubernetesPhysicalFileSystem.cs index e1bc2378d..45d0570a5 100644 --- a/source/Octopus.Tentacle/Kubernetes/KubernetesPhysicalFileSystem.cs +++ b/source/Octopus.Tentacle/Kubernetes/KubernetesPhysicalFileSystem.cs @@ -1,5 +1,6 @@ using System; using System.IO; +using System.Threading.Tasks; using Octopus.Tentacle.Core.Diagnostics; using Octopus.Tentacle.Core.Util; using Octopus.Tentacle.Util; @@ -46,6 +47,21 @@ public override void EnsureDiskHasEnoughFreeSpace(string directoryPath, long req public (ulong freeSpaceBytes, ulong totalSpaceBytes)? GetStorageInformation() { var bytesUsed = directoryInformationProvider.GetPathUsedBytes(HomeDir); + return BuildStorageInformation(bytesUsed); + } + + public async Task<(ulong freeSpaceBytes, ulong totalSpaceBytes)?> GetStorageInformationAsync() + { + var bytesUsed = await directoryInformationProvider.GetPathUsedBytesAsync(HomeDir); + return BuildStorageInformation(bytesUsed); + } + + // Shared sync assembler used by both GetStorageInformation (sync, for the + // IOctopusFileSystem override path) and GetStorageInformationAsync (for + // CreateScriptContainer). Pulled out to keep the post-fetch logic DRY + // across the sync/async pair we need. + (ulong freeSpaceBytes, ulong totalSpaceBytes)? BuildStorageInformation(ulong? bytesUsed) + { var bytesTotal = directoryInformationProvider.GetPathTotalBytes(); if (bytesUsed.HasValue && bytesTotal.HasValue) { diff --git a/source/Octopus.Tentacle/Kubernetes/KubernetesScriptPodCreator.cs b/source/Octopus.Tentacle/Kubernetes/KubernetesScriptPodCreator.cs index 894a7d7ab..7a1439ad1 100644 --- a/source/Octopus.Tentacle/Kubernetes/KubernetesScriptPodCreator.cs +++ b/source/Octopus.Tentacle/Kubernetes/KubernetesScriptPodCreator.cs @@ -300,7 +300,7 @@ void LogVerboseToBothLogs(string message, InMemoryTentacleScriptLog tentacleScri protected async Task CreateScriptContainer(StartKubernetesScriptCommandV1 command, string podName, string scriptName, string homeDir, string workspacePath, string[]? scriptArguments, InMemoryTentacleScriptLog tentacleScriptLog, V1Container? containerSpec) { - var spaceInformation = kubernetesPhysicalFileSystem.GetStorageInformation(); + var spaceInformation = await kubernetesPhysicalFileSystem.GetStorageInformationAsync(); var commandString = string.Join(" ", new[] { diff --git a/source/Octopus.Tentacle/Services/Capabilities/CapabilitiesServiceV2.cs b/source/Octopus.Tentacle/Services/Capabilities/CapabilitiesServiceV2.cs index 7dda62791..a4537f26a 100644 --- a/source/Octopus.Tentacle/Services/Capabilities/CapabilitiesServiceV2.cs +++ b/source/Octopus.Tentacle/Services/Capabilities/CapabilitiesServiceV2.cs @@ -6,6 +6,7 @@ using Octopus.Tentacle.Contracts.KubernetesScriptServiceV1; using Octopus.Tentacle.Contracts.ScriptServiceV2; using Octopus.Tentacle.Core.Services; +using Octopus.Tentacle.Core.Services.Scripts; using Octopus.Tentacle.Util; namespace Octopus.Tentacle.Services.Capabilities @@ -24,7 +25,7 @@ public async Task GetCapabilitiesAsync(CancellationToken } //non-kubernetes agent tentacles only support the standard script services - return new CapabilitiesResponseV2(new List { nameof(IScriptService), nameof(IFileTransferService), nameof(IScriptServiceV2) }); + return new CapabilitiesResponseV2(new List { nameof(IScriptService), nameof(IFileTransferService), nameof(IScriptServiceV2), nameof(ScriptServiceV2.AbandonScriptAsync) }); } } } \ No newline at end of file diff --git a/source/Octopus.Tentacle/Services/Scripts/ScriptService.cs b/source/Octopus.Tentacle/Services/Scripts/ScriptService.cs index e1b952202..82b482d6f 100644 --- a/source/Octopus.Tentacle/Services/Scripts/ScriptService.cs +++ b/source/Octopus.Tentacle/Services/Scripts/ScriptService.cs @@ -93,7 +93,7 @@ public async Task CompleteScriptAsync(CompleteScriptComman RunningScript LaunchShell(ScriptTicket ticket, string serverTaskId, IScriptWorkspace workspace, CancellationTokenSource cancel) { - var runningScript = new RunningScript(shell, workspace, workspace.CreateLog(), serverTaskId, scriptIsolationMutex, cancel.Token, new Dictionary(), PowerShellStartupDetection.PowerShellStartupTimeout, log); + var runningScript = RunningScript.Create(shell, workspace, workspace.CreateLog(), serverTaskId, scriptIsolationMutex, cancel.Token, new Dictionary(), PowerShellStartupDetection.PowerShellStartupTimeout, log); _ = Task.Run(async () => await runningScript.Execute()); return runningScript; } diff --git a/source/Octopus.Tentacle/Startup/LinuxServiceConfigurator.cs b/source/Octopus.Tentacle/Startup/LinuxServiceConfigurator.cs index f57809cd0..128996c1d 100644 --- a/source/Octopus.Tentacle/Startup/LinuxServiceConfigurator.cs +++ b/source/Octopus.Tentacle/Startup/LinuxServiceConfigurator.cs @@ -1,8 +1,9 @@ -using System; +using System; using System.Collections.Generic; using System.IO; using System.Text; using System.Text.RegularExpressions; +using System.Threading.Tasks; using Octopus.Tentacle.Core.Diagnostics; using Octopus.Tentacle.Util; @@ -47,22 +48,38 @@ public void ConfigureServiceByConfigPath(string thisServiceName, serviceConfigurationState); } + // Used by ServiceCommand (an AbstractCommand) to install/configure the + // Tentacle as a Linux systemd service. + // + // Why this is sync: AbstractCommand.Start() is sync because ICommand.Start() + // is sync. When Tentacle runs as a Windows service we host AbstractCommands + // via Topshelf, whose runtime callback API is also sync — so the call path + // has to return sync end-to-end. + // + // Why blocking on the async call is safe: the console-app main thread has + // no SynchronizationContext. Topshelf's OnStart callback runs on a fresh + // `new Thread(...)` worker that also has none. Either way, nothing for the + // awaited continuation to wait on. + // See https://blog.stephencleary.com/2012/07/dont-block-on-async-code.html void ConfigureService(string thisServiceName, string exePath, string? instance, string? configPath, string serviceDescription, ServiceConfigurationState serviceConfigurationState) + => ConfigureServiceAsync(thisServiceName, exePath, instance, configPath, serviceDescription, serviceConfigurationState).GetAwaiter().GetResult(); + + async Task ConfigureServiceAsync(string thisServiceName, string exePath, string? instance, string? configPath, string serviceDescription, ServiceConfigurationState serviceConfigurationState) { //Check if system has bash and systemd - CheckSystemPrerequisites(); + await CheckSystemPrerequisitesAsync(); var cleanedInstanceName = SanitizeString(instance ?? thisServiceName); var systemdUnitFilePath = $"/etc/systemd/system/{cleanedInstanceName}.service"; if (serviceConfigurationState.Restart) - RestartService(cleanedInstanceName); + await RestartServiceAsync(cleanedInstanceName); if (serviceConfigurationState.Stop) - StopService(cleanedInstanceName); + await StopServiceAsync(cleanedInstanceName); if (serviceConfigurationState.Uninstall) - UninstallService(cleanedInstanceName, systemdUnitFilePath); + await UninstallServiceAsync(cleanedInstanceName, systemdUnitFilePath); var serviceDependencies = new List(); serviceDependencies.AddRange(new[] {"network.target"}); @@ -72,7 +89,7 @@ void ConfigureService(string thisServiceName, string exePath, string? instance, var userName = serviceConfigurationState.Username ?? "root"; if (serviceConfigurationState.Install) - InstallService(cleanedInstanceName, + await InstallServiceAsync(cleanedInstanceName, instance, configPath, exePath, @@ -82,7 +99,7 @@ void ConfigureService(string thisServiceName, string exePath, string? instance, serviceDependencies); if (serviceConfigurationState.Reconfigure) - ReconfigureService(cleanedInstanceName, + await ReconfigureServiceAsync(cleanedInstanceName, instance, configPath, exePath, @@ -92,42 +109,42 @@ void ConfigureService(string thisServiceName, string exePath, string? instance, serviceDependencies); if (serviceConfigurationState.Start) - StartService(cleanedInstanceName); + await StartServiceAsync(cleanedInstanceName); } - void RestartService(string serviceName) + async Task RestartServiceAsync(string serviceName) { log.Info($"Restarting service: {serviceName}"); - if (systemCtlHelper.RestartService(serviceName)) + if (await systemCtlHelper.RestartServiceAsync(serviceName)) log.Info("Service has been restarted"); else log.Error("The service could not be restarted"); } - void StopService(string serviceName) + async Task StopServiceAsync(string serviceName) { log.Info($"Stopping service: {serviceName}"); - if (systemCtlHelper.StopService(serviceName)) + if (await systemCtlHelper.StopServiceAsync(serviceName)) log.Info("Service stopped"); else log.Error("The service could not be stopped"); } - void StartService(string serviceName) + async Task StartServiceAsync(string serviceName) { - if (systemCtlHelper.StartService(serviceName, true)) + if (await systemCtlHelper.StartServiceAsync(serviceName, true)) log.Info($"Service started: {serviceName}"); else log.Error($"Could not start the systemd service: {serviceName}"); } - void UninstallService(string instance, string systemdUnitFilePath) + async Task UninstallServiceAsync(string instance, string systemdUnitFilePath) { log.Info($"Removing systemd service: {instance}"); try { - systemCtlHelper.StopService(instance); - systemCtlHelper.DisableService(instance); + await systemCtlHelper.StopServiceAsync(instance); + await systemCtlHelper.DisableServiceAsync(instance); File.Delete(systemdUnitFilePath); log.Info("Service uninstalled"); } @@ -138,7 +155,7 @@ void UninstallService(string instance, string systemdUnitFilePath) } } - void InstallService(string serviceName, + async Task InstallServiceAsync(string serviceName, string? instance, string? configPath, string exePath, @@ -149,8 +166,8 @@ void InstallService(string serviceName, { try { - WriteUnitFile(systemdUnitFilePath, GenerateSystemdUnitFile(instance, configPath, serviceDescription, exePath, userName, serviceDependencies)); - systemCtlHelper.EnableService(serviceName, true); + await WriteUnitFileAsync(systemdUnitFilePath, GenerateSystemdUnitFile(instance, configPath, serviceDescription, exePath, userName, serviceDependencies)); + await systemCtlHelper.EnableServiceAsync(serviceName, true); log.Info($"Service installed: {serviceName}"); } catch (Exception e) @@ -160,7 +177,7 @@ void InstallService(string serviceName, } } - void ReconfigureService(string serviceName, + async Task ReconfigureServiceAsync(string serviceName, string? instance, string? configPath, string exePath, @@ -173,13 +190,13 @@ void ReconfigureService(string serviceName, { log.Info($"Attempting to remove old service: {serviceName}"); //remove service - systemCtlHelper.StopService(serviceName); - systemCtlHelper.DisableService(serviceName); + await systemCtlHelper.StopServiceAsync(serviceName); + await systemCtlHelper.DisableServiceAsync(serviceName); File.Delete(systemdUnitFilePath); //re-add service - WriteUnitFile(systemdUnitFilePath, GenerateSystemdUnitFile(instance, configPath, serviceDescription, exePath, userName, serviceDependencies)); - systemCtlHelper.EnableService(serviceName, true); + await WriteUnitFileAsync(systemdUnitFilePath, GenerateSystemdUnitFile(instance, configPath, serviceDescription, exePath, userName, serviceDependencies)); + await systemCtlHelper.EnableServiceAsync(serviceName, true); log.Info($"Service installed: {serviceName}"); } catch (Exception e) @@ -189,48 +206,48 @@ void ReconfigureService(string serviceName, } } - void WriteUnitFile(string path, string contents) + async Task WriteUnitFileAsync(string path, string contents) { File.WriteAllText(path, contents); var commandLineInvocation = new CommandLineInvocation("/bin/bash", $"-c \"chmod 644 {path}\""); - var result = commandLineInvocation.ExecuteCommand(); + var result = await commandLineInvocation.ExecuteCommandAsync(); if (result.ExitCode == 0) return; result.Validate(); } - void CheckSystemPrerequisites() + async Task CheckSystemPrerequisitesAsync() { if (!File.Exists("/bin/bash")) throw new ControlledFailureException( "Could not detect bash. bash is required to run tentacle."); - if (!HaveSudoPrivileges()) + if (!await HaveSudoPrivilegesAsync()) throw new ControlledFailureException( "Requires elevated privileges. Please run command as sudo."); - if (!IsSystemdInstalled()) + if (!await IsSystemdInstalledAsync()) throw new ControlledFailureException( "Could not detect systemd. systemd is required to run Tentacle as a service"); } - bool IsSystemdInstalled() + async Task IsSystemdInstalledAsync() { var commandLineInvocation = new CommandLineInvocation("/bin/bash", "-c \"command -v systemctl >/dev/null\""); - var result = commandLineInvocation.ExecuteCommand(); + var result = await commandLineInvocation.ExecuteCommandAsync(); return result.ExitCode == 0; } - bool HaveSudoPrivileges() + async Task HaveSudoPrivilegesAsync() { var commandLineInvocation = new CommandLineInvocation("/bin/bash", "-c \"sudo -vn 2> /dev/null\""); - var result = commandLineInvocation.ExecuteCommand(); + var result = await commandLineInvocation.ExecuteCommandAsync(); return result.ExitCode == 0; } - string GenerateSystemdUnitFile(string? instance, + string GenerateSystemdUnitFile(string? instance, string? configPath, string serviceDescription, string exePath, string userName, IEnumerable serviceDependencies) { @@ -246,7 +263,7 @@ string GenerateSystemdUnitFile(string? instance, if (!string.IsNullOrEmpty(instance)) { stringBuilder.Append($" --instance={instance}"); - } + } else if (!string.IsNullOrEmpty(configPath)) { stringBuilder.Append($" --config={configPath}"); @@ -267,4 +284,4 @@ string GenerateSystemdUnitFile(string? instance, static string SanitizeString(string str) => Regex.Replace(str.Replace("/", ""), @"\s+", "-"); } -} \ No newline at end of file +} diff --git a/source/Octopus.Tentacle/Startup/WindowsServiceConfigurator.cs b/source/Octopus.Tentacle/Startup/WindowsServiceConfigurator.cs index 1fc5dd0eb..6a3f0358b 100644 --- a/source/Octopus.Tentacle/Startup/WindowsServiceConfigurator.cs +++ b/source/Octopus.Tentacle/Startup/WindowsServiceConfigurator.cs @@ -1,4 +1,4 @@ -using System; +using System; using System.Collections.Generic; using System.Diagnostics.CodeAnalysis; using System.IO; @@ -7,6 +7,7 @@ using System.ServiceProcess; using System.Text; using System.Threading; +using System.Threading.Tasks; using Octopus.Tentacle.Core.Diagnostics; using Octopus.Tentacle.Util; using Polly; @@ -58,12 +59,33 @@ public void ConfigureServiceByConfigPath(string thisServiceName, serviceConfigurationState); } + // Used by ServiceCommand (an AbstractCommand) to install/configure the + // Tentacle as a Windows Service. + // + // Why this is sync: AbstractCommand.Start() is sync because ICommand.Start() + // is sync. When Tentacle runs as a Windows service we host AbstractCommands + // via Topshelf, whose runtime callback API is also sync — so the call path + // has to return sync end-to-end. + // + // Why blocking on the async call is safe: the console-app main thread has + // no SynchronizationContext. Topshelf's OnStart callback runs on a fresh + // `new Thread(...)` worker that also has none. Either way, nothing for the + // awaited continuation to wait on. + // See https://blog.stephencleary.com/2012/07/dont-block-on-async-code.html void ConfigureService(string thisServiceName, string exePath, string? instance, string? configPath, string serviceDescription, ServiceConfigurationState serviceConfigurationState) + => ConfigureServiceAsync(thisServiceName, exePath, instance, configPath, serviceDescription, serviceConfigurationState).GetAwaiter().GetResult(); + + async Task ConfigureServiceAsync(string thisServiceName, + string exePath, + string? instance, + string? configPath, + string serviceDescription, + ServiceConfigurationState serviceConfigurationState) { windowsLocalAdminRightsChecker.AssertIsRunningElevated(); var services = ServiceController.GetServices(); @@ -81,7 +103,7 @@ void ConfigureService(string thisServiceName, if (serviceConfigurationState.Uninstall) { - UninstallService(thisServiceName, controller); + await UninstallServiceAsync(thisServiceName, controller); } var serviceDependencies = new List(); @@ -96,18 +118,18 @@ void ConfigureService(string thisServiceName, if (serviceConfigurationState.Install) { - controller = InstallService(thisServiceName, exePath, instance, configPath, + controller = await InstallServiceAsync(thisServiceName, exePath, instance, configPath, serviceDescription, serviceConfigurationState, controller, serviceDependencies); } if (serviceConfigurationState.Reconfigure) { - ReconfigureService(thisServiceName, exePath, instance, configPath, serviceDescription, serviceDependencies); + await ReconfigureServiceAsync(thisServiceName, exePath, instance, configPath, serviceDescription, serviceDependencies); } if ((serviceConfigurationState.Install || serviceConfigurationState.Reconfigure) && !string.IsNullOrWhiteSpace(serviceConfigurationState.Username)) { - ConfigureCredentialsForService(thisServiceName, serviceConfigurationState); + await ConfigureCredentialsForServiceAsync(thisServiceName, serviceConfigurationState); } if (serviceConfigurationState.Start) @@ -116,7 +138,7 @@ void ConfigureService(string thisServiceName, } } - void ConfigureCredentialsForService(string thisServiceName, ServiceConfigurationState serviceConfigurationState) + async Task ConfigureCredentialsForServiceAsync(string thisServiceName, ServiceConfigurationState serviceConfigurationState) { if (!string.IsNullOrWhiteSpace(serviceConfigurationState.Password)) { @@ -137,13 +159,13 @@ void ConfigureCredentialsForService(string thisServiceName, ServiceConfiguration } else { - Sc($"config \"{thisServiceName}\" obj= \"{serviceConfigurationState.Username}\""); + await ScAsync($"config \"{thisServiceName}\" obj= \"{serviceConfigurationState.Username}\""); } log.Info("Service credentials set"); } - void ReconfigureService(string thisServiceName, + async Task ReconfigureServiceAsync(string thisServiceName, string exePath, string? instance, string? configPath, @@ -154,13 +176,13 @@ void ReconfigureService(string thisServiceName, var command = exePath.EndsWith(".dll") ? $"config \"{thisServiceName}\" binpath= \"dotnet \\\"{exePath}\\\" run {instanceIdentifier} DisplayName= \"{thisServiceName}\" depend= {string.Join("/", serviceDependencies)} start= auto" : $"config \"{thisServiceName}\" binpath= \"\\\"{exePath}\\\" run {instanceIdentifier} DisplayName= \"{thisServiceName}\" depend= {string.Join("/", serviceDependencies)} start= auto"; - Sc(command); - Sc($"description \"{thisServiceName}\" \"{serviceDescription}\""); + await ScAsync(command); + await ScAsync($"description \"{thisServiceName}\" \"{serviceDescription}\""); log.Info("Service reconfigured"); } - ServiceController? InstallService(string thisServiceName, + async Task InstallServiceAsync(string thisServiceName, string exePath, string? instance, string? configPath, @@ -181,8 +203,8 @@ void ReconfigureService(string thisServiceName, ? $"create \"{thisServiceName}\" binpath= \"dotnet \\\"{exePath}\\\" run {instanceIdentifier} DisplayName= \"{thisServiceName}\" depend= {string.Join("/", serviceDependencies)} start= auto" : $"create \"{thisServiceName}\" binpath= \"\\\"{exePath}\\\" run {instanceIdentifier} DisplayName= \"{thisServiceName}\" depend= {string.Join("/", serviceDependencies)} start= auto"; - Sc(command); - Sc($"description \"{thisServiceName}\" \"{serviceDescription}\""); + await ScAsync(command); + await ScAsync($"description \"{thisServiceName}\" \"{serviceDescription}\""); } log.Info("Service installed"); @@ -207,7 +229,7 @@ static string InstanceIdentifier(string? instance, string? configPath) throw new InvalidOperationException("Either the instance name of configuration path must be provided to configure a service"); } - void UninstallService(string thisServiceName, ServiceController? controller) + async Task UninstallServiceAsync(string thisServiceName, ServiceController? controller) { if (controller == null) { @@ -215,7 +237,7 @@ void UninstallService(string thisServiceName, ServiceController? controller) } else { - Sc($"delete \"{thisServiceName}\""); + await ScAsync($"delete \"{thisServiceName}\""); log.Info("Service uninstalled"); } @@ -333,7 +355,7 @@ void WaitForControllerStatus(ServiceController controller, ServiceControllerStat 150); } - void Sc(string arguments) + async Task ScAsync(string arguments) { var outputBuilder = new StringBuilder(); var argumentsToLog = string.Join(" ", arguments); @@ -342,7 +364,7 @@ void Sc(string arguments) var sc = Path.Combine(system32, "sc.exe"); logFileOnlyLogger.Info($"Executing sc.exe {argumentsToLog}"); - var exitCode = SilentProcessRunnerExtended.ExecuteCommand(sc, + var exitCode = await SilentProcessRunnerExtended.ExecuteCommandAsync(sc, arguments, Environment.CurrentDirectory, output => outputBuilder.AppendLine(output), diff --git a/source/Octopus.Tentacle/Util/CommandLineRunner.cs b/source/Octopus.Tentacle/Util/CommandLineRunner.cs index cc6da549a..b19903eff 100644 --- a/source/Octopus.Tentacle/Util/CommandLineRunner.cs +++ b/source/Octopus.Tentacle/Util/CommandLineRunner.cs @@ -1,5 +1,7 @@ -using System; +using System; using System.Collections.Generic; +using System.Threading; +using System.Threading.Tasks; using Octopus.Tentacle.Core.Diagnostics; namespace Octopus.Tentacle.Util @@ -7,13 +9,11 @@ namespace Octopus.Tentacle.Util public class CommandLineRunner : ICommandLineRunner { public bool Execute(IEnumerable commandLineInvocations, ILog log) - { - return Execute(commandLineInvocations, + => Execute(commandLineInvocations, log.Verbose, log.Info, log.Error, log.Error); - } public bool Execute(IEnumerable commandLineInvocations, Action debug, @@ -35,15 +35,57 @@ public bool Execute(CommandLineInvocation invocation, ILog log) log.Error, log.Error); + // We're at the ICommandLineRunner sync entry point, consumed by Octopus.Manager.Tentacle + // (WPF). The WPF installer calls Execute from ThreadPool.QueueUserWorkItem (a sync + // delegate), so this is the sync-over-async bridge: a one-line wrapper over the public + // async implementation. Safe because the installer dispatches us on a plain thread-pool + // worker. No captured SynchronizationContext, so no deadlock. Async callers should call + // ExecuteAsync directly. + // See https://blog.stephencleary.com/2012/07/dont-block-on-async-code.html public bool Execute(CommandLineInvocation invocation, Action debug, Action info, Action error, Action exception) + => ExecuteAsync(invocation, debug, info, error, exception).GetAwaiter().GetResult(); + + public Task ExecuteAsync(IEnumerable commandLineInvocations, ILog log) + => ExecuteAsync(commandLineInvocations, + log.Verbose, + log.Info, + log.Error, + log.Error); + + public async Task ExecuteAsync(IEnumerable commandLineInvocations, + Action debug, + Action info, + Action error, + Action exception) + { + foreach (var invocation in commandLineInvocations) + if (!await ExecuteAsync(invocation, debug, info, error, exception)) + return false; + + return true; + } + + public Task ExecuteAsync(CommandLineInvocation invocation, ILog log) + => ExecuteAsync(invocation, + log.Info, + log.Info, + log.Error, + log.Error); + + public async Task ExecuteAsync(CommandLineInvocation invocation, + Action debug, + Action info, + Action error, + Action exception) { try { - var exitCode = SilentProcessRunner.ExecuteCommand(invocation.Executable, + var exitCode = await SilentProcessRunner.ExecuteCommandAsync( + invocation.Executable, (invocation.Arguments ?? "") + " " + (invocation.SystemArguments ?? ""), Environment.CurrentDirectory, debug, @@ -75,4 +117,4 @@ public bool Execute(CommandLineInvocation invocation, return true; } } -} \ No newline at end of file +} diff --git a/source/Octopus.Tentacle/Util/ICommandLineRunner.cs b/source/Octopus.Tentacle/Util/ICommandLineRunner.cs index a1244a681..ab0b1d7db 100644 --- a/source/Octopus.Tentacle/Util/ICommandLineRunner.cs +++ b/source/Octopus.Tentacle/Util/ICommandLineRunner.cs @@ -1,5 +1,6 @@ -using System; +using System; using System.Collections.Generic; +using System.Threading.Tasks; using Octopus.Tentacle.Core.Diagnostics; namespace Octopus.Tentacle.Util @@ -21,5 +22,21 @@ bool Execute(CommandLineInvocation invocation, Action info, Action error, Action exception); + + Task ExecuteAsync(IEnumerable commandLineInvocations, ILog log); + + Task ExecuteAsync(IEnumerable commandLineInvocations, + Action debug, + Action info, + Action error, + Action exception); + + Task ExecuteAsync(CommandLineInvocation commandLineInvocation, ILog log); + + Task ExecuteAsync(CommandLineInvocation invocation, + Action debug, + Action info, + Action error, + Action exception); } -} \ No newline at end of file +} diff --git a/source/Octopus.Tentacle/Util/ISilentProcessRunner.cs b/source/Octopus.Tentacle/Util/ISilentProcessRunner.cs index 06a61e558..fa6d21f45 100644 --- a/source/Octopus.Tentacle/Util/ISilentProcessRunner.cs +++ b/source/Octopus.Tentacle/Util/ISilentProcessRunner.cs @@ -1,49 +1,52 @@ -using System; +using System; using System.Collections.Generic; using System.Threading; +using System.Threading.Tasks; using Octopus.Tentacle.Startup; namespace Octopus.Tentacle.Util { public interface ISilentProcessRunner { - public int ExecuteCommand( + Task ExecuteCommandAsync( string executable, string arguments, string workingDirectory, Action info, Action error, - CancellationToken cancel = default); + CancellationToken cancel = default, + CancellationToken abandon = default); - public int ExecuteCommand( + Task ExecuteCommandAsync( string executable, string arguments, string workingDirectory, Action debug, Action info, Action error, - CancellationToken cancel = default); + CancellationToken cancel = default, + CancellationToken abandon = default); } public class SilentProcessRunnerWrapper : ISilentProcessRunner { - public int ExecuteCommand(string executable, string arguments, string workingDirectory, Action info, Action error, CancellationToken cancel = default) + public Task ExecuteCommandAsync(string executable, string arguments, string workingDirectory, Action info, Action error, CancellationToken cancel = default, CancellationToken abandon = default) { - return SilentProcessRunnerExtended.ExecuteCommand(executable, arguments, workingDirectory, info, error, cancel); + return SilentProcessRunnerExtended.ExecuteCommandAsync(executable, arguments, workingDirectory, info, error, cancel, abandon); } - public int ExecuteCommand(string executable, string arguments, string workingDirectory, Action debug, Action info, Action error, CancellationToken cancel = default) + public Task ExecuteCommandAsync(string executable, string arguments, string workingDirectory, Action debug, Action info, Action error, CancellationToken cancel = default, CancellationToken abandon = default) { - return SilentProcessRunner.ExecuteCommand(executable, arguments, workingDirectory, debug, info, error, cancel: cancel); + return SilentProcessRunner.ExecuteCommandAsync(executable, arguments, workingDirectory, debug, info, error, cancel: cancel, abandon: abandon); } } public static class SilentProcessRunnerExtended { - public static CmdResult ExecuteCommand(this CommandLineInvocation invocation) - => ExecuteCommand(invocation, Environment.CurrentDirectory); + public static async Task ExecuteCommandAsync(this CommandLineInvocation invocation) + => await ExecuteCommandAsync(invocation, Environment.CurrentDirectory); - public static CmdResult ExecuteCommand(this CommandLineInvocation invocation, string workingDirectory) + public static async Task ExecuteCommandAsync(this CommandLineInvocation invocation, string workingDirectory) { if (workingDirectory == null) throw new ArgumentNullException(nameof(workingDirectory)); @@ -52,7 +55,7 @@ public static CmdResult ExecuteCommand(this CommandLineInvocation invocation, st var infos = new List(); var errors = new List(); - var exitCode = ExecuteCommand( + var exitCode = await ExecuteCommandAsync( invocation.Executable, arguments, workingDirectory, @@ -63,20 +66,22 @@ public static CmdResult ExecuteCommand(this CommandLineInvocation invocation, st return new CmdResult(exitCode, infos, errors); } - public static int ExecuteCommand( + public static Task ExecuteCommandAsync( string executable, string arguments, string workingDirectory, Action info, Action error, - CancellationToken cancel = default) - => SilentProcessRunner.ExecuteCommand(executable, + CancellationToken cancel = default, + CancellationToken abandon = default) + => SilentProcessRunner.ExecuteCommandAsync(executable, arguments, workingDirectory, LogFileOnlyLogger.Current.Info, info, error, customEnvironmentVariables: null, - cancel: cancel); + cancel: cancel, + abandon: abandon); } -} \ No newline at end of file +} diff --git a/source/Octopus.Tentacle/Util/SystemCtlHelper.cs b/source/Octopus.Tentacle/Util/SystemCtlHelper.cs index 8ff4fd632..d97fb3a03 100644 --- a/source/Octopus.Tentacle/Util/SystemCtlHelper.cs +++ b/source/Octopus.Tentacle/Util/SystemCtlHelper.cs @@ -1,5 +1,6 @@ using System; using System.Linq; +using System.Threading.Tasks; using Octopus.Tentacle.Core.Diagnostics; namespace Octopus.Tentacle.Util @@ -13,26 +14,26 @@ public SystemCtlHelper(ISystemLog log) this.log = log; } - public bool StartService(string serviceName, bool logFailureAsError = false) - => RunServiceCommand("start", serviceName, logFailureAsError); + public Task StartServiceAsync(string serviceName, bool logFailureAsError = false) + => RunServiceCommandAsync("start", serviceName, logFailureAsError); - public bool RestartService(string serviceName, bool logFailureAsError = false) - => RunServiceCommand("restart", serviceName, logFailureAsError); + public Task RestartServiceAsync(string serviceName, bool logFailureAsError = false) + => RunServiceCommandAsync("restart", serviceName, logFailureAsError); - public bool StopService(string serviceName, bool logFailureAsError = false) - => RunServiceCommand("stop", serviceName, logFailureAsError); + public Task StopServiceAsync(string serviceName, bool logFailureAsError = false) + => RunServiceCommandAsync("stop", serviceName, logFailureAsError); - public bool EnableService(string serviceName, bool logFailureAsError = false) - => RunServiceCommand("enable", serviceName, logFailureAsError); + public Task EnableServiceAsync(string serviceName, bool logFailureAsError = false) + => RunServiceCommandAsync("enable", serviceName, logFailureAsError); - public bool DisableService(string serviceName, bool logFailureAsError = false) - => RunServiceCommand("disable", serviceName, logFailureAsError); + public Task DisableServiceAsync(string serviceName, bool logFailureAsError = false) + => RunServiceCommandAsync("disable", serviceName, logFailureAsError); - bool RunServiceCommand(string command, string serviceName, bool logFailureAsError) + async Task RunServiceCommandAsync(string command, string serviceName, bool logFailureAsError) { // Try without sudo first var commandLineInvocation = new CommandLineInvocation("/bin/bash", $"-c \"systemctl {command} {serviceName}\""); - var result = commandLineInvocation.ExecuteCommand(); + var result = await commandLineInvocation.ExecuteCommandAsync(); if (result.ExitCode == 0) return true; // Check if failure is due to authentication/permission issues @@ -48,9 +49,9 @@ bool RunServiceCommand(string command, string serviceName, bool logFailureAsErro { log.Info($"Permission denied. Retrying 'systemctl {command} {serviceName}' with sudo..."); var sudoInvocation = new CommandLineInvocation("/bin/bash", $"-c \"sudo systemctl {command} {serviceName}\""); - result = sudoInvocation.ExecuteCommand(); + result = await sudoInvocation.ExecuteCommandAsync(); if (result.ExitCode == 0) return true; - + usedSudo = true; }