Skip to content

Upgrade Octopus Tentacle Docker image from Debian 11 (Bullseye) to Debian 12 (Bookworm)#1218

Open
todthomson wants to merge 21 commits into
mainfrom
eft/tod/fix/EFT-3311-upgrade-linux-dockerfile-to-debian-12
Open

Upgrade Octopus Tentacle Docker image from Debian 11 (Bullseye) to Debian 12 (Bookworm)#1218
todthomson wants to merge 21 commits into
mainfrom
eft/tod/fix/EFT-3311-upgrade-linux-dockerfile-to-debian-12

Conversation

@todthomson
Copy link
Copy Markdown
Contributor

@todthomson todthomson commented May 7, 2026

Background

The Linux Tentacle Docker image was on an older Debian base. This PR upgrades docker/linux/Dockerfile to debian:12-slim (bookworm), updates the runtime dependency versions to what Debian 12 ships (libicu72, libssl3, libgssapi-krb5-2), and trims explicit installs of packages already present in the base image.

Results

The Linux Tentacle container now runs on Debian 12 (bookworm) with up-to-date runtime libraries.

Testing

Built and exercised the image end-to-end on an Apple Silicon host (Docker Desktop, --platform linux/amd64).

Check Result
./build.sh PackDebianPackage --RuntimeIds linux-x64 .deb built
docker build -f docker/linux/Dockerfile image built (328 MB)
Base image Debian GNU/Linux 12 (bookworm) confirmed via /etc/os-release
tentacle --version reports correct version
tentacle help full command list output
Shared libs (libssl, libicu, libgssapi-krb5, libstdc++, libc) all resolved via ldd /opt/octopus/tentacle/Tentacle
Dockerfile-installed tools (curl, jq, sudo, xxd, dos2unix, dockerd, docker, tentacle) all present on PATH
tentacle dpkg install status installed cleanly (ii)
Symlink /usr/bin/tentacle → /opt/octopus/tentacle/Tentacle present
configure-and-run.sh EULA gate (no ACCEPT_EULA) fires expected error
configure-and-run.sh credentials validation (no API key/creds) fires expected error
configure-and-run.sh ServerUrl validation fires expected error
E2E registration + script execution via scripts/smoke-test-linux-tentacle.sh PASSWorkers-1 registered against a local Octopus Server (sibling OctopusDeploy repo), AdHocScript task reached Success, task log emitted Hello from <container-id> and PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"

The smoke-test script (scripts/smoke-test-linux-tentacle.sh) is committed alongside this change and is intended as the starting point for a CI gate. It builds the image from the local .deb, pulls the Ultimate license from 1Password, brings up Octopus Server via the sibling repo's docker compose, registers the Tentacle as a worker (polling mode, DIND disabled), submits a hello-world AdHocScript via the REST API, asserts state=Success, and restores .env on exit. Recommend wiring it into CI before merge.

How to review this PR

Quality ✔️

Pre-requisites

  • I have read How we use GitHub Issues for help deciding when and where it's appropriate to make an issue.
  • I have considered informing or consulting the right people, according to the ownership map.
  • I have considered appropriate testing for my change.

@todthomson todthomson requested a review from a team as a code owner May 7, 2026 05:17
@todthomson todthomson force-pushed the eft/tod/fix/EFT-3311-upgrade-linux-dockerfile-to-debian-12 branch from b4e4116 to 7f2e79a Compare May 11, 2026 01:39
@todthomson todthomson requested a review from LukeButters May 12, 2026 01:23
@todthomson todthomson marked this pull request as draft May 12, 2026 01:23
@LukeButters LukeButters requested a review from Copilot May 12, 2026 01:48
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Upgrades the Linux Tentacle Docker image base to Debian 12 (bookworm) and adds a new end-to-end smoke-test script intended to validate image + worker registration + task execution against a local Octopus Server.

Changes:

  • Bump docker/linux/Dockerfile base image to debian:12-slim and update runtime library dependencies (e.g., libicu72, libssl3).
  • Add scripts/smoke-test-linux-tentacle.sh to build the image from a local .deb, start a local server (sibling repo), register a worker, and run an AdHocScript.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
docker/linux/Dockerfile Switch base to Debian 12 and adjust installed runtime dependencies / package install commands.
scripts/smoke-test-linux-tentacle.sh New end-to-end smoke test script for local validation of the Debian 12 Tentacle image.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docker/linux/Dockerfile Outdated
Comment thread scripts/smoke-test-linux-tentacle.sh
Comment thread scripts/smoke-test-linux-tentacle.sh
Comment thread scripts/smoke-test-linux-tentacle.sh Outdated
@todthomson todthomson marked this pull request as ready for review May 12, 2026 02:59
@todthomson
Copy link
Copy Markdown
Contributor Author

@claude review once

@claude
Copy link
Copy Markdown

claude Bot commented May 13, 2026

Claude finished @todthomson's task in 2m 24s —— View job


Code Review

Read docker/linux/Dockerfile, scripts/smoke-test-linux-tentacle.sh, and adjacent scripts (install-docker.sh, configure-and-run.sh, etc.) to ground the review. Overall this is a clean change — Debian 12 (libicu72 / libssl3 / libgssapi-krb5-2) is the correct package set for bookworm, and the four Copilot comments are addressed solidly (pure-bash upsert_env_var, mktemp for the override compose, mktemp for the .env backup, BSD-portable templates).

A few things worth a second look before merge — none blocking.

docker/linux/Dockerfile

  1. Inconsistent flags between the two apt-get install invocations. Lines 3-9 use -y --no-install-recommends and clean only /var/lib/apt/lists/*; lines 14-23 use -y (no --no-install-recommends) and call apt-get clean in addition to rm -rf /var/lib/apt/lists/*. If there's a reason to allow recommends for the dev-tooling block (curl jq sudo xxd dos2unix), a one-line comment would help future maintainers; otherwise, normalising to --no-install-recommends here too gives a smaller, more deterministic layer. (Fix this →)

  2. Duplicate installs vs. install-docker.sh. curl, jq, sudo, and dos2unix are also installed by docker/linux/install-scripts/install-docker.sh (lines 13, 35, 38, 32). Because each RUN clears the apt cache, the second install is effectively a no-op but still adds layer overhead and two sources of truth for the package set. Not a blocker for this PR, but a candidate for a follow-up tidy.

  3. Trailing whitespace at line 46 (after ln -s … /usr/bin/tentacle). Cosmetic; many repos have a hook that flags this.

scripts/smoke-test-linux-tentacle.sh

  1. Worker selection is a "highest Workers-N" heuristic (lines 199-201). Fine for a fresh local environment, but if the Octopus Server's DB volume is reused across runs the workers list grows and the heuristic can latch onto a stale worker if Workers-N numbering ever rolls back (it shouldn't, but a more robust approach would tag the worker with a per-run name and look up by that). Worth firming up before this gets wired into CI, since CI re-uses environments more often than a dev laptop.

  2. No deregistration of the worker in teardown. docker compose down removes the container, but the worker record persists in the Server's DB (which lives in a compose volume in the sibling repo). Over many runs the workers list grows monotonically. A DELETE $API/workers/$WORKER_ID (best-effort, || true) at the start of teardown — guarded by [[ -n "$WORKER_ID" ]] — would keep the test idempotent.

  3. grep -q "Configuration successful." (line 171) treats the period as a regex wildcard. In practice this matches the intended log line, but grep -qF would be a literal match and is what's actually intended.

  4. API_KEY is hard-coded to the dev sentinel (API-APIKEY01). Fine for a local Server stood up by the sibling repo, but a sentence in the header comment clarifying why it's safe to commit (it's the well-known dev key the sibling repo's compose stack provisions, not a real secret) would save a reviewer's heartbeat next time. (Fix this →)

  5. op dependency for CI. The script is positioned as "the starting point for a CI gate," but op read op://... will not work in CI runners that aren't signed in to 1Password. Before wiring this in, the license source will need to be swapped for whatever CI secret store you're using (GitHub Actions secret, etc.), gated behind an env var like OCTOPUS_LICENSE_BASE64 with op read as the local fallback. Worth flagging in the PR description so it's not a CI-day surprise.

  6. Output of the AdHocScript only asserts Debian GNU/Linux 12 (line 251) as a warning, not as a hard failure (line 252 only warns). If the whole point of the smoke test is to prove the new base image is what's running, this is the load-bearing assertion and probably deserves die rather than warn. (The OS-release line could change between minor bookworm releases — Debian GNU/Linux 12 is stable through point releases, so the literal match is fine.)

  7. Minor: line 117 uses $ENV_FILE.tmp as a sibling temp file. If two runs collide on the same .env they race, but that's already true at the compose-stack level, so this isn't worth fixing on its own.

What looks good

  • Pure-bash upsert_env_var is the right fix for the python3 comment — no escape headaches with base64, no new tool dep.
  • mktemp templates correctly end in XXXXXX for BSD portability (commit 88d9a27); docker compose -f doesn't care about the .yml extension so dropping it was the right trade.
  • Teardown guards ([[ -n "$OVERRIDE_COMPOSE" && -f "$OVERRIDE_COMPOSE" ]], same for $ENV_BACKUP) are exactly right — they only touch what this run created.
  • jq --arg for the task body avoids JSON-injection footguns with the worker id.
  • set -euo pipefail + EXIT trap that preserves the original exit code is solid.

Summary

Ship-shape from a correctness standpoint. The Dockerfile changes are minimal and right. The smoke test is well-built for local use; the CI-readiness items (op dep, hard-fail on the OS-release assertion, worker deregistration) are the main things to tidy before this is the CI gate the PR description says it should become.
· branch: eft/tod/fix/EFT-3311-upgrade-linux-dockerfile-to-debian-12

Comment thread scripts/smoke-test-linux-tentacle.sh Outdated
Comment on lines +250 to +253
# Sanity: the printed log should mention Debian 12.
if ! curl -fsS -H "$H" "$API/tasks/$TASK_ID/raw" | grep -q 'Debian GNU/Linux 12'; then
warn "Task succeeded but the log does NOT mention 'Debian GNU/Linux 12'. Inspect output above."
fi
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The Debian 12 sanity check on line 252 calls warn rather than die, so when the AdHocScript log fails to mention Debian GNU/Linux 12 the script still falls through to the unconditional PASS log and exits 0. Since this is the only assertion that the script actually ran against the new Debian 12 image — the whole point of the EFT-3311 smoke test — a stale Debian 11 worker (or any other non-Debian-12 target the worker selection happens to pick up) would produce a false-green result. Change warn to die so a missing Debian GNU/Linux 12 string hard-fails.

Extended reasoning...

What the bug is

The smoke test's pass criteria has two parts: (a) the AdHocScript reaches State == Success (enforced at line 247 with die), and (b) the script's stdout mentions Debian GNU/Linux 12 (checked at lines 251-253). Only the first is actually enforced. The OS-release check on line 252 calls warn, which only prints to stderr — control falls through to line 255's unconditional log "PASS — Tentacle (Debian 12) registered and executed hello-world." and the script exits 0.

Why this matters for this PR specifically

This PR's sole purpose is to upgrade the Linux Tentacle base image from debian:11-slim to debian:12-slim. The script's header says it is the "End-to-end smoke test for the Linux Tentacle Docker image (EFT-3311)" and the PR description positions it as the intended starting point for a CI gate on that upgrade. The only artifact in the entire script that proves the AdHocScript ran on a Debian 12 container — as opposed to, say, a Debian 11 Tentacle from a previous build that happens to still be alive locally — is the cat /etc/os-release output checked at line 252. Demoting that check to a warning means the gate the script is supposed to provide doesn't actually exist.

The specific code path that triggers a false PASS

Step-by-step:

  1. A previous smoke-test run (on Debian 11, before this PR) registered Workers-N against the local Octopus Server and the container was never docker rm'd, so the worker registration is still present in the server's database.
  2. The current run brings up its own Tentacle container, but if registration is slow or the wrapper exits before re-registering, the new worker may not yet exist when Step 5 runs the /api/workers query.
  3. Worker selection (lines 195-204) picks the worker with the highest Workers-N ordinal across the entire server — i.e. the most recently registered worker globally, not the one this run created. If the stale Debian 11 worker has the highest ordinal at query time, WORKER_ID resolves to it.
  4. The AdHocScript (a one-line echo … ; cat /etc/os-release) is submitted with WorkerIds: [$id]. Bash and /etc/os-release exist on Debian 11 Tentacles, so the task reaches State == Success.
  5. Line 251 fetches the task log and greps for Debian GNU/Linux 12. On a Debian 11 worker, this grep fails.
  6. Because the check uses warn instead of die, the script continues and reaches line 255: PASS — Tentacle (Debian 12) registered and executed hello-world. Exit code 0.

Note that the stale-worker scenario is the cleanest trigger, but it is not the only one — anything that causes the worker selection or the docker build to point at the wrong image (cached smoke-debian12 tag, manual override, registration race) produces the same false-green. The point of an assertion is to catch all such scenarios, not just the ones we can enumerate.

Why existing code doesn't prevent it

set -euo pipefail ensures that unhandled failures abort the script, but warn deliberately swallows the failure (its body is printf … >&2 with no exit). die is the only helper in this script that exits non-zero. The verifier confirmations all note the same thing — the function definitions at lines 39-41 make the distinction explicit: warn returns, die exits 1.

Impact

The smoke test's stated value is gating the Debian 12 upgrade. If the assertion that actually proves Debian 12 was exercised is non-fatal, then a CI invocation of this script can return green while having validated nothing about the new base image. That defeats the gate.

How to fix

One-character change on line 252:

-    warn "Task succeeded but the log does NOT mention 'Debian GNU/Linux 12'. Inspect output above."
+    die  "Task succeeded but the log does NOT mention 'Debian GNU/Linux 12'. Inspect output above."

The error message is already worded as a failure condition, so only the helper call needs to change. This brings the assertion in line with the State != Success check immediately above it (line 246-248), which already uses die.

Comment on lines +233 to +240
for i in {1..120}; do
STATE="$(curl -fsS -H "$H" "$API/tasks/$TASK_ID" | jq -r '.State')"
echo " task=$TASK_ID state=$STATE"
case "$STATE" in
Success|Failed|Canceled|TimedOut) break ;;
esac
sleep 2
done
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The task polling loop at line 234 (STATE="$(curl -fsS … | jq -r '.State')") runs under set -euo pipefail, so a single transient curl failure inside the 120-iteration loop will exit the script mid-poll without printing the structured task=X state=Y diagnostic for the failing iteration. Consider mirroring the defensive pattern already used in the worker-discovery loop at line 198 (e.g. … || echo Unknown) so polling is resilient to brief hiccups and produces consistent per-iteration output. Nit — smoke-test tooling against a local server, and curl -fsS will still emit its own stderr message, so this isn't a correctness issue.

Extended reasoning...

What

In scripts/smoke-test-linux-tentacle.sh, the task-state polling loop at lines 233–240:

for i in {1..120}; do
  STATE="$(curl -fsS -H "$H" "$API/tasks/$TASK_ID" | jq -r '.State')"
  echo "  task=$TASK_ID state=$STATE"
  case "$STATE" in
    Success|Failed|Canceled|TimedOut) break ;;
  esac
  sleep 2
done

runs under set -euo pipefail (line 13). The command substitution wraps a curl-piped-to-jq pipeline. If curl exits non-zero (HTTP 4xx/5xx with -f, transient connection error, container restart, brief 5xx during task execution), pipefail propagates the non-zero status through the pipeline, the command substitution carries it to the assignment, and set -e exits the script immediately — before the echo " task=… state=…" line for that iteration runs.

Step-by-step proof

  1. Iteration 47 begins; curl is invoked against $API/tasks/$TASK_ID.
  2. Octopus Server briefly returns HTTP 502 (or the TCP connection is reset, or the container is mid-restart).
  3. curl -fsS exits with non-zero (e.g. 22 for HTTP errors, 7 for connection refused).
  4. jq runs on empty stdin and exits 0 (or non-zero), but pipefail ensures the pipeline's exit status is curl's non-zero code.
  5. The command substitution propagates that status to the STATE=… assignment.
  6. set -e fires; the script exits.
  7. The EXIT trap runs teardown() — which prints "--- teardown ---" and tears down compose.
  8. The user sees the curl stderr line (e.g. curl: (22) The requested URL returned error: 502) and the teardown trace, but no task=$TASK_ID state=… line for iteration 47, and no indication that the script was specifically in the task-polling loop when it died. They have to scroll up and reason about where in the script the abort happened.

Why existing code doesn't prevent it

The worker-discovery polling loop a few lines above (line 198) handles exactly this case defensively:

WORKERS_JSON="$(curl -fsS -H "$H" "$API/workers?take=1000" 2>/dev/null || echo '{"Items":[]}')"

The || echo '{"Items":[]}' absorbs transient curl failures and feeds valid JSON to jq, so the loop keeps polling. The task-polling loop has no such guard.

The same pattern (curl inside a pipeline whose failure can abort) also appears at line 251 (if ! curl … | grep -q …). Strictly that one won't abort the script — set -e is suppressed inside an if condition — but a curl failure there gets mis-attributed as "log doesn't mention Debian 12" via the warn, which is a separate UX paper-cut.

Addressing the counter-argument

A reasonable counter-argument is that curl -fsS uses the capital -S flag (show errors), so the user does see a curl stderr line like curl: (7) Failed to connect to … — i.e. the failure isn't truly silent. That's correct, and it's the main reason this is a nit, not a normal bug. The remaining gap is diagnostic quality: an operator running the smoke test sees a curl error plus an abrupt teardown trace, but loses the structured task=$TASK_ID state=… line that the loop is designed to emit each iteration, and there's no die message tying the failure back to "task polling" specifically. For a 2-minute polling loop against a server that may legitimately be doing other work, one transient 5xx terminating the whole smoke run is also a slightly steep response.

It's also reasonable to argue the two polling loops have different semantics (worker may legitimately not yet exist; task definitely does), so fail-fast on line 234 is intentional. That's a defensible design choice — but if a hiccup is fatal, it'd be nicer to die "Task polling failed at iteration $i: curl returned $?" than to let set -e exit silently from a command substitution.

Suggested fix

One-line change matching line 198's pattern:

STATE="$(curl -fsS -H "$H" "$API/tasks/$TASK_ID" 2>/dev/null | jq -r '.State' 2>/dev/null || echo Unknown)"

This lets the loop continue polling on transient failures, the echo " task=… state=Unknown" line still prints (so the operator can see which iteration hiccuped), and a persistent failure still falls out of the 120-iteration window with a clean "state never reached Success" die.

Severity rationale

Nit. Smoke-test tooling, not production code; the server is local so transient hiccups are uncommon; curl -fsS still emits stderr; and exit code is preserved so test integrity is intact. Purely a defensive-resilience and diagnostic-quality improvement, worth doing because the fix is one line and mirrors a pattern already in the same file.

Comment on lines +108 to +110
ENV_BACKUP="$(mktemp "${TMPDIR:-/tmp}/octopus-server-env-smoke-tentacle-XXXXXX")"
cp "$ENV_FILE" "$ENV_BACKUP"
log "Backed up .env to $ENV_BACKUP (will be restored on exit)"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The .env backup pattern at scripts/smoke-test-linux-tentacle.sh:108-109 has a small race window: mktemp creates an empty file and assigns ENV_BACKUP before cp populates it. If the script is interrupted (Ctrl+C, SIGTERM) between those two statements, or if cp itself fails under set -e, the EXIT trap fires with ENV_BACKUP non-empty and the empty file present — its guard [[ -n "$ENV_BACKUP" && -f "$ENV_BACKUP" ]] passes and mv "$ENV_BACKUP" "$ENV_FILE" clobbers the real .env with an empty file. This is the same data-loss scenario commit 439be89 was meant to prevent. Trivial fix: only assign ENV_BACKUP after cp succeeds (copy to a local temp path first, then assign).

Extended reasoning...

The race window

Lines 108-109 of scripts/smoke-test-linux-tentacle.sh:

ENV_BACKUP="$(mktemp "${TMPDIR:-/tmp}/octopus-server-env-smoke-tentacle-XXXXXX")"
cp "$ENV_FILE" "$ENV_BACKUP"

mktemp does two things in one shot: it creates an empty file on disk and prints its path. After bash finishes the command substitution, ENV_BACKUP is set and the file exists — but it is still empty. The cp that actually populates it is a separate statement.

The EXIT trap installed at line 60 runs teardown on any exit path, including signals. Its guard at lines 54-55 is:

if [[ -n "$ENV_BACKUP" && -f "$ENV_BACKUP" ]]; then
  mv "$ENV_BACKUP" "$ENV_FILE"

That guard cannot distinguish "backup we already wrote" from "empty file mktemp just made".

Step-by-step proof

  1. Bash evaluates line 108. The subshell runs mktemp, which creates /tmp/octopus-server-env-smoke-tentacle-abc123 (empty, 0 bytes) and prints the path.
  2. Bash assigns that path to ENV_BACKUP. Variable is now non-empty, file exists, file is empty.
  3. Trigger A — signal: User hits Ctrl-C before bash starts line 109. set -e plus signal propagation invokes the EXIT trap.
  4. Trigger B — cp failure under set -e: Disk full, source unreadable mid-read, or any other cp error causes set -e to exit. EXIT trap fires.
  5. teardown runs. Guard passes (path non-empty, file exists). mv "$ENV_BACKUP" "$ENV_FILE" moves the empty file over the user's real .env. Data destroyed.

Why existing code doesn't prevent this

ENV_BACKUP is initialized empty at line 21 specifically so the trap can tell "backup not yet made" from "backup made". That intent is correct but the protection is defeated by mktemp because the assignment and the population happen in two statements. The guard checks the wrong thing — file existence rather than file validity.

This is exactly the failure mode commit 439be89 (and the resolved Copilot comment 3223737974) set out to eliminate. The mktemp pattern protected against stale backups from previous runs, but it introduced a smaller race against the current run's own interrupt or cp failure.

Impact

The target file .env contains the Ultimate license fetched from 1Password. Loss is recoverable (op read again), so this is not catastrophic, and the timing window for a pure signal race is microseconds. The more realistic trigger is cp failure under set -e — narrower-looking on paper but achievable (e.g. ENOSPC on $TMPDIR).

Suggested fix

Assign ENV_BACKUP only after cp succeeds, so the trap's "is the path set?" check is meaningful:

_tmp="$(mktemp "${TMPDIR:-/tmp}/octopus-server-env-smoke-tentacle-XXXXXX")"
cp "$ENV_FILE" "$_tmp"
ENV_BACKUP="$_tmp"
log "Backed up .env to $ENV_BACKUP (will be restored on exit)"

Alternatively, gate the restore in teardown on a BACKUP_READY=1 flag set after cp, or compare sizes ([[ -s "$ENV_BACKUP" ]]) before restoring.

Severity

Nit — developer-only smoke test, window is small, license is recoverable from 1Password. But the bug is real and undermines the safety guarantee the change it lives in was meant to add, so it's worth a line.

🔬 also observed by copilot-pull-request-reviewer

@todthomson
Copy link
Copy Markdown
Contributor Author

Smoke-test feedback #1 — tag worker with per-run name (commit e707cdc)

Before — relied on the "highest Workers-N" heuristic to find the worker we just registered, which is fragile when the Server's DB volume is reused across runs:

WORKER_ID="$(echo "$WORKERS_JSON" \
  | jq -r '[.Items[] | select(.Id | startswith("Workers-"))] | sort_by(.Id | ltrimstr("Workers-") | tonumber) | last | .Id // empty')"

After — generate a unique per-run name and pass it to the Tentacle as TargetName via the compose override, then look it up by name:

WORKER_TARGET_NAME="smoke-tentacle-$(date +%Y%m%d-%H%M%S)-$$"
# ... in the override compose:
#   environment:
#     TargetName: "$WORKER_TARGET_NAME"
WORKERS_JSON="$(curl -fsS -H "$H" --data-urlencode "name=$WORKER_TARGET_NAME" -G "$API/workers" ...)"
WORKER_ID="$(echo "$WORKERS_JSON" | jq -r --arg name "$WORKER_TARGET_NAME" '.Items[] | select(.Name == $name) | .Id' | head -n1)"

Robust against reused DB volumes and any future Workers-N renumbering — the lookup matches the exact worker this run registered.

@todthomson
Copy link
Copy Markdown
Contributor Author

Smoke-test feedback #2 — deregister worker in teardown (commit 4730f64)

Before — teardown stopped containers but left the worker record in the Server's DB, so the workers list grew monotonically across runs sharing a Server DB volume:

teardown() {
  local exit_code=$?
  log "--- teardown ---"
  if [[ -n "$OVERRIDE_COMPOSE" && -f "$OVERRIDE_COMPOSE" ]]; then
    (cd "$SERVER_REPO" && docker compose -f docker-compose.yml -f "$OVERRIDE_COMPOSE" --profile tentacle down 2>/dev/null) || true
    rm -f "$OVERRIDE_COMPOSE"
  fi
  (cd "$SERVER_REPO" && docker compose down 2>/dev/null) || true
  ...
}

After — best-effort DELETE /api/workers/$WORKER_ID first (while the Server is still up), guarded on $WORKER_ID being populated so an early failure before Step 5 is a no-op:

teardown() {
  local exit_code=$?
  log "--- teardown ---"
  if [[ -n "$WORKER_ID" ]]; then
    log "Deregistering worker $WORKER_ID"
    curl -fsS -X DELETE -H "$H" "$API/workers/$WORKER_ID" >/dev/null 2>&1 || true
  fi
  ...
}

WORKER_ID is also pre-declared empty at the top of the script so teardown can reference it safely on early failures.

@todthomson
Copy link
Copy Markdown
Contributor Author

Smoke-test feedback #3grep -qF for literal match (commit 3387e40)

Before — the period in the search string is a regex wildcard under plain grep -q:

if "${COMPOSE[@]}" logs --no-color tentacle 2>/dev/null | grep -q "Configuration successful."; then

After — -F (--fixed-strings) treats the pattern literally, which is what's intended:

if "${COMPOSE[@]}" logs --no-color tentacle 2>/dev/null | grep -qF "Configuration successful."; then

@todthomson
Copy link
Copy Markdown
Contributor Author

Smoke-test feedback #4 — document API_KEY dev sentinel (commit ded20ae)

Before — API_KEY="API-APIKEY01" was hard-coded with no explanation, which is a reviewer-heartbeat moment when discovering an API key in a committed file:

API="http://localhost:8065/api"
API_KEY="API-APIKEY01"
H="X-Octopus-ApiKey: $API_KEY"

After — header comment explains that this is the well-known dev sentinel provisioned by the sibling repo's compose stack, not a real secret:

# Note on $API_KEY below: "API-APIKEY01" is the well-known dev sentinel API key
# provisioned by the sibling OctopusDeploy repo's docker-compose stack for its
# local-only Server instance. It is not a real secret and is safe to commit.

@todthomson
Copy link
Copy Markdown
Contributor Author

Smoke-test feedback #5op dependency for CI (commit 8991552)

Before — hard dependency on op (signed in), which won't work in CI runners that don't have 1Password configured:

require op
...
if ! op account list >/dev/null 2>&1; then
  die "1Password CLI is not signed in. Run: eval \$(op signin)"
fi
LICENSE_BASE64="$(op read "$ONEPASSWORD_LICENSE_REF" 2>/dev/null || true)"

After — CI can inject the license via $OCTOPUS_LICENSE_BASE64; op read is the local-dev fallback only, and require op is gated accordingly:

[[ -n "${OCTOPUS_LICENSE_BASE64:-}" ]] || require op
...
if [[ -n "${OCTOPUS_LICENSE_BASE64:-}" ]]; then
  LICENSE_BASE64="$OCTOPUS_LICENSE_BASE64"
  log "Using license from \$OCTOPUS_LICENSE_BASE64 (${#LICENSE_BASE64} bytes)"
else
  if ! op account list >/dev/null 2>&1; then
    die "1Password CLI is not signed in. Run: eval \$(op signin) — or pre-set \$OCTOPUS_LICENSE_BASE64."
  fi
  LICENSE_BASE64="$(op read "$ONEPASSWORD_LICENSE_REF" 2>/dev/null || true)"
  ...
fi

The header comment also documents the new env var, so the CI integration path is discoverable.

@todthomson
Copy link
Copy Markdown
Contributor Author

Smoke-test feedback #6 — hard-fail the Debian 12 assertion (commit b913c00)

Before — the load-bearing assertion of the smoke test was only a warning, so the script would still PASS even if the Tentacle wasn't actually running Debian 12:

# Sanity: the printed log should mention Debian 12.
if ! curl -fsS -H "$H" "$API/tasks/$TASK_ID/raw" | grep -q 'Debian GNU/Linux 12'; then
  warn "Task succeeded but the log does NOT mention 'Debian GNU/Linux 12'. Inspect output above."
fi

After — promoted to die, since proving the new base image is what's running is the whole point of this test. Also switched the grep to -qF to match #3:

# Load-bearing assertion: the whole point of this smoke test is to prove the
# Debian 12 base image is what's actually running on the Tentacle, so a missing
# os-release line is a hard failure, not a warning.
if ! curl -fsS -H "$H" "$API/tasks/$TASK_ID/raw" | grep -qF 'Debian GNU/Linux 12'; then
  die "Task succeeded but the log does NOT mention 'Debian GNU/Linux 12'. Inspect output above."
fi

@todthomson
Copy link
Copy Markdown
Contributor Author

Smoke-test feedback #7mktemp for upsert_env_var temp file (commit 8d39dda)

Before — sibling $ENV_FILE.tmp could race with a concurrent run on the same .env:

upsert_env_var() {
  local key="$1" value="$2"
  local tmp="$ENV_FILE.tmp" line found=
  : > "$tmp"
  ...
}

After — unique per-call tmp via mktemp, matching the XXXXXX template convention already used elsewhere in the script:

upsert_env_var() {
  local key="$1" value="$2"
  local tmp line found=
  tmp="$(mktemp "${TMPDIR:-/tmp}/octopus-server-env-smoke-upsert-XXXXXX")"
  ...
}

@todthomson
Copy link
Copy Markdown
Contributor Author

Local smoke test: ✅ PASS

Re-ran scripts/smoke-test-linux-tentacle.sh against a local Octopus Server after applying all 7 fixes above. Final output:

[smoke] --- end task log ---
[smoke] PASS — Tentacle (Debian 12) registered and executed hello-world.
[smoke] --- teardown ---
[smoke] Deregistering worker Workers-1
[smoke] Restored /Users/tnt/code/GitHub/OctopusDeploy/OctopusDeploy/.env

Confirms in one run that the new behaviours are wired up:

todthomson and others added 11 commits May 21, 2026 10:00
…age configuration, since apt-utils is not installed
Builds the image from the local .deb, brings up a local Octopus Server, registers
the Tentacle as a worker, and runs a hello-world AdHocScript via the REST API to
verify the Debian 12 upgrade end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tter

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Replace upsert_env_var's Python heredoc with a pure-bash implementation.
The previous version invoked python3 without listing it in the script's
required-tools check, so the failure mode on a machine without Python was
a confusing "python3: command not found" mid-run instead of an upfront
"Missing required tool" message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous version wrote and deleted a fixed-name `docker-compose.smoke.yml`
in the sibling OctopusDeploy repo. If a user already had a file by that name
(now or in the future), this script would silently clobber it and remove it
on teardown.

Switch to `mktemp` in $TMPDIR so the path is unique-per-run, and only delete
it in teardown if this run actually created one.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
todthomson and others added 10 commits May 21, 2026 10:00
The previous fixed `.env.smoke-test.bak` path would silently overwrite a
stale backup from a previously-crashed run, losing the original .env.

Switch to `mktemp` in $TMPDIR so the backup path is unique-per-run, log
the path so it's discoverable from script output if anything goes wrong,
and guard the teardown restore on the variable being set so an early
failure (before the backup is made) doesn't try to mv an empty path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BSD mktemp (macOS) silently leaves Xs unsubstituted when they aren't the
final characters of the template — so the previous templates
("...XXXXXX.bak" and "...XXXXXX.yml") actually produced fixed-name files
with literal Xs, defeating the unique-per-run guarantee. Verified by
inspecting the logged backup path.

Move the Xs to the end and drop the .bak/.yml suffixes (docker compose
parses by content not extension, and the suffix was cosmetic for the
backup). Verified end-to-end against a local Octopus Server.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The period in the search string is a regex wildcard with plain grep -q;
-F (--fixed-strings) makes the match literal, which is what's intended.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a sentence to the header comment explaining that API-APIKEY01 is
the well-known dev key provisioned by the sibling OctopusDeploy repo's
compose stack for its local Server, not a real secret.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This is the load-bearing assertion of the smoke test — if the Tentacle
isn't actually running on Debian 12, the test should fail, not warn.
Also switches the grep to -qF so the period is treated literally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Avoids writing a sibling \$ENV_FILE.tmp that two concurrent runs could
race on. Uses the same XXXXXX template convention as the other mktemp
calls in this script.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lets CI runners (which can't use op) inject the license via env var,
while keeping op read as the local-dev default. The op tool is only
required when the env var is unset.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Generates a unique per-run worker name (smoke-tentacle-<timestamp>-<pid>)
and passes it to the Tentacle as TargetName via the override compose.
The Server-side lookup then filters by Name == $WORKER_TARGET_NAME,
which is robust against reused DB volumes growing the workers list and
against any future Workers-N renumbering.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the Server's DB volume is reused across runs (CI especially), the
workers list otherwise grows monotonically. Teardown now issues a
best-effort DELETE /api/workers/\$WORKER_ID before bringing the stack
down, guarded on \$WORKER_ID being populated (so an early failure
before Step 5 is a no-op).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@todthomson todthomson force-pushed the eft/tod/fix/EFT-3311-upgrade-linux-dockerfile-to-debian-12 branch from 4730f64 to dca6d6a Compare May 21, 2026 00:00
@todthomson
Copy link
Copy Markdown
Contributor Author

Local smoke test (re-run): ✅ PASS

Second clean run of scripts/smoke-test-linux-tentacle.sh on top of all 7 fixes. Milestones:

[smoke] Server is up after 94s
[smoke] Tentacle registered after 23s
[smoke] Submitted task: ServerTasks-9
  task=ServerTasks-9 state=Queued
  task=ServerTasks-9 state=Executing
  ...
  task=ServerTasks-9 state=Success
00:07:02   Info     |     Hello from 1a6bfbcb55c7
00:07:02   Info     |     PRETTY_NAME="Debian GNU/Linux 12 (bookworm)"
[smoke] PASS — Tentacle (Debian 12) registered and executed hello-world.
[smoke] --- teardown ---
[smoke] Deregistering worker Workers-1
[smoke] Restored /Users/tnt/code/GitHub/OctopusDeploy/OctopusDeploy/.env

Confirms (again) that the AdHocScript ran on Debian GNU/Linux 12 (bookworm), the worker was looked up by per-run TargetName (fix #1), and teardown deregistered the worker (fix #2).

@todthomson todthomson changed the title fix: upgrade linux dockerfile to debian 12 Upgrade Octopus Tentacle Docker image from Debian 11 (Bullseye) to Debian 12 (Bookworm) May 21, 2026
@todthomson todthomson enabled auto-merge (squash) May 21, 2026 07:24
@todthomson todthomson disabled auto-merge May 21, 2026 07:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants