Document Tentacle script abandonment#3175
Conversation
Adds a new section to the "Troubleshooting failed or hanging tasks" page covering the two automatic recoveries that ship with EFT-365 (PowerShell startup detection) and EFT-3295 (cancel-abandon escape hatch). The section sits between the existing "Automatic failure of hanging tasks" subsection (Hung Deployment Detection — a different feature) and the "Antivirus software" subsection (operator-side fix), so the page flows from deployment-level detection, to script-level recovery, to the underlying fix the customer still needs to make. Honest framing throughout: both recoveries are mitigation, the underlying problem is on the target machine, and the abandon path explicitly does not kill the runaway script process. Refs: EFT-365, EFT-3295 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Pull request environment is available at https://stoctodocspr3175.z22.web.core.windows.net. You can view the ephemeral environment status in Octopus Deploy. This environment will be automatically deprovisioned when the pull request is closed, or after 7 days of inactivity. |
…page
Pivots away from the troubleshooting-embedded approach in the previous commit.
The customer-facing product term is Tentacle script abandonment (per the
Naming Playbook applied to the Linear ticket title, code, FT name, and
Tentacle's own log line). It deserves a standalone page.
What changed:
- New page at src/pages/docs/infrastructure/deployment-targets/tentacle/tentacle-script-abandonment.md
Covers both triggers under one product term:
- PowerShell startup detection (EFT-365)
- Cancellation timeout (EFT-3295)
Explains the underlying cause (target-side AV/EDR interference) with a
link to OctopusTentacle#1208 for stack traces and the CrowdStrike +
Rapid7 deadlock analysis. Cross-links to the existing antivirus
exclusion list rather than duplicating it.
- Troubleshooting page: the embedded "Recovering from stuck PowerShell
scripts on Tentacle" section is removed and replaced with a short
cross-link to the new page. The troubleshooting page stays symptom-led;
the new page is product-term-led.
Language discipline: no colloquial "stuck", "hung", "hanging", "frozen"
anywhere in the customer-facing prose. "script" (not "PowerShell script")
for the umbrella behaviour; "PowerShell script" only inside the
PowerShell-specific trigger section.
Refs: EFT-365, EFT-3295
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| navOrder: 58 | ||
| --- | ||
|
|
||
| Octopus Tentacle can abandon a deployment script when the script can't run normally on the target. Abandonment releases the Tentacle's per-target mutex so the next deployment in the queue can start, even though the script's underlying process may still be running on the target. |
There was a problem hiding this comment.
Can we add a link to documentation about the per target mutex?
|
|
||
| When Tentacle abandons a script: | ||
|
|
||
| - The Tentacle's per-target mutex is released. The next deployment in the queue for that target can start immediately. |
There was a problem hiding this comment.
Not in the queue, the tentacle will begin executing the next script it has for that target
| When Tentacle abandons a script: | ||
|
|
||
| - The Tentacle's per-target mutex is released. The next deployment in the queue for that target can start immediately. | ||
| - The Tentacle-side runtime locks holding state for the script are dropped. |
There was a problem hiding this comment.
How is this different to the mutex?
|
|
||
| If `powershell.exe` doesn't reach the first instruction of your script in 5 minutes, Tentacle marks the task as `Failed` with exit code `-47` and prevents the script body from running, even if PowerShell wakes up later. Tentacle records a log line like: | ||
|
|
||
| ``` |
There was a problem hiding this comment.
specify this code block as whatever the most generic thing is. Its a block that contains sample logs
|
|
||
| Tentacle abandons a script in response to one of two triggers. | ||
|
|
||
| ### PowerShell startup detection |
There was a problem hiding this comment.
@LukeButters did we really do this for powershell only? And if so was that just because thats the only thing we'd observed the behaviour for?
There was a problem hiding this comment.
Yes only for powershell.
Afaik we have never seen this issue affect bash (on linux).
| PowerShell startup detection: PowerShell did not start within 5 minutes for task <task ID> | ||
| ``` | ||
|
|
||
| Version requirements: |
There was a problem hiding this comment.
Is this something we do standard. If so lets keep it as is. If not lets make this less table format and more an explination
|
|
||
| The server-side task log records: | ||
|
|
||
| ``` |
There was a problem hiding this comment.
Again give this a language, use a generic one
| Tentacle abandoned the script. | ||
| ``` | ||
|
|
||
| If the script had already completed by the time abandonment was attempted, the second line reads: |
There was a problem hiding this comment.
Can you confirm we have unit tests on these log lines that include a comment ensuring people know there is documentation dependant on this? If not ask me to go add them
|
|
||
| For a worked example with stack traces and a detailed analysis of a CrowdStrike + Rapid7 deadlock on a customer's target, see [OctopusTentacle issue #1208](https://github.com/OctopusDeploy/OctopusTentacle/issues/1208). | ||
|
|
||
| Multiple security agents installed on the same host are the most common pattern. The fix is on the target machine. |
There was a problem hiding this comment.
The fix is on the target machine is an AIism. Make it sound like me
| Both abandonment triggers are mitigation, not a fix. The underlying problem is on the target machine, and you're best placed to fix it. Three steps, in order: | ||
|
|
||
| 1. **Configure your antivirus or endpoint-protection software to exclude Tentacle's working directories.** Specifically `<Tentacle Home>\Tools` and `<Tentacle Home>\Work`. The full exclusion list and additional directories you can include if you're still seeing issues are documented in [Troubleshooting failed or hanging tasks: Antivirus software](/docs/support/troubleshooting-failed-or-hanging-tasks#anti-virus-software). | ||
| 2. **Keep target-side security tooling updated.** Known interactions between specific CrowdStrike and Rapid7 versions cause the deadlock; vendor updates have addressed similar issues before. |
There was a problem hiding this comment.
Where is the example where vendors have addressed similar issues before
|
|
||
| This is generally indicative of an internal error in Octopus. In Octopus Cloud we actively monitor for these issues, but please reach out to support for further assistance, especially if the problem persists. | ||
|
|
||
| ### Tentacle script abandonment |
There was a problem hiding this comment.
This sounds like this would cause a hanging task. When really its the solution to many of these problems
- Link "per-target mutex" to /docs/administration/managing-infrastructure/run-multiple-processes-on-a-target-simultaneously - Drop "queue" framing: Tentacle picks up the next script it has for that target, not a notion of a queue. - Remove the redundant "runtime locks" bullet (covered by the mutex bullet). - Add `text` language fence to all sample-log code blocks. - PowerShell startup detection scope reworded: PowerShell-only because that's the only context where the failure has been observed (confirmed by @LukeButters in review). - Convert bulleted "Version requirements" blocks to single-sentence prose per existing docs convention. - Drop the "Cancellation hasn't taken effect..." dispatch log line. The current implementation only emits the two outcome lines (`Tentacle abandoned the script` or `Script had already completed before abandon was needed`); the dispatch line was a spec aspiration that isn't in the code yet. - Replace the AIism "The fix is on the target machine." with two concrete sentences naming Octopus's limit and where the fix lives. - Remove the unsupported "vendor updates have addressed similar issues before" claim. Replace with a directive to check the vendor's release notes. - Reword the cross-link section on the troubleshooting page from "Tentacle script abandonment" to "Automatic recovery for hanging tasks" so the heading reads as the solution, not the problem. The canonical product term stays in the link text. Refs: EFT-365, EFT-3295 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
||
| ## What to do about it | ||
|
|
||
| Both abandonment triggers are mitigation, not a fix. The underlying problem is on the target machine, and you're best placed to fix it. Three steps, in order: |
There was a problem hiding this comment.
| Both abandonment triggers are mitigation, not a fix. The underlying problem is on the target machine, and you're best placed to fix it. Three steps, in order: | |
| Both abandonment triggers are mitigation, not a fix. The underlying problem is on the target machine, to fix it consider: |
|
|
||
| For a worked example with stack traces and a detailed analysis of a CrowdStrike + Rapid7 deadlock on a customer's target, see [OctopusTentacle issue #1208](https://github.com/OctopusDeploy/OctopusTentacle/issues/1208). | ||
|
|
||
| Multiple security agents installed on the same host are the most common pattern. Octopus can't reach inside that interaction to fix it. The fix lives in your target-side antivirus configuration. |
There was a problem hiding this comment.
| Multiple security agents installed on the same host are the most common pattern. Octopus can't reach inside that interaction to fix it. The fix lives in your target-side antivirus configuration. | |
| Multiple security agents installed on the same host are the most common pattern. Octopus can't reach inside that interaction to resolve this situation. |
Summary
Adds a new standalone page at
src/pages/docs/infrastructure/deployment-targets/tentacle/tentacle-script-abandonment.mdcovering Tentacle script abandonment, the product behaviour that lets Tentacle abandon a deployment script when it can't run normally, releasing the per-target mutex so the next deployment in the queue can start.The page covers both triggers under one product term:
powershell.exedoesn't start executing the script body within 5 minutes. Task endsFailedwith exit code-47. Windows + PowerShell only.Cancelled. Any script on Tentacle (Windows or Linux); SSH and Kubernetes agent not in scope.The page also covers why these failures happen (target-side antivirus/EDR interference) with a link to
OctopusTentacle#1208for the stack traces and CrowdStrike + Rapid7 deadlock analysis, and what customers should do about it (whitelist Tentacle paths, cross-linked to the existing antivirus section on the troubleshooting page).The existing
Troubleshooting failed or hanging taskspage gets a short cross-link pointer to the new page.Test plan
#anti-virus-softwareon the troubleshooting page resolves (anchor exists on the existing page)58slots the new page betweenagent-vs-agentless(55) andtroubleshooting-tentacles(60) in the Tentacle sidebarOpen questions for review
@lucyjspence @LukeButters, three things in the new page worth a second eye on:
AbandonScriptonIScriptServiceV2, update before merging.pwshsupport on a roadmap, or does this stay Windows +powershell.exeonly indefinitely? Affects whether to soften that paragraph or leave it firm.Reducing risk
Refs: EFT-365, EFT-3295
[JIM_BOT.EXE v2.13]