Skip to content

Implement interlink-apptainer-plugin: direct Apptainer execution without SLURM, with full e2e test suite#1

Merged
dciangot merged 5 commits into
mainfrom
copilot/run-job-with-apptainer
May 6, 2026
Merged

Implement interlink-apptainer-plugin: direct Apptainer execution without SLURM, with full e2e test suite#1
dciangot merged 5 commits into
mainfrom
copilot/run-job-with-apptainer

Conversation

Copilot AI commented May 6, 2026

Copy link
Copy Markdown
Contributor
  • Implement interlink-apptainer-plugin core (Create/Delete/Status/GetLogs/SystemInfo handlers)
  • Port unit tests from slurm plugin (47 tests pass)
  • Add docker/Dockerfile, Makefile, k3s integration test scripts, GitHub Actions workflows
  • Fix e2e probe failure: pass $timeout to executeHTTPProbe in both runProbe and runStartupProbe
    • Root cause: executeHTTPProbe declares local timeout="$5", but was called with only 4 HTTP args + container_name as arg 5 — so timeout got set to the container name string (e.g. "probe-test"), making timeout "probe-test" curl … fail with invalid time interval on every HTTP probe check
    • Both runProbe and runStartupProbe now call executeHTTPProbe "${probe_args[@]}" "$timeout" "$container_name"
    • Added probes_test.go regression tests to cover this
Original prompt

start from https://github.com/interlink-hq/interlink-slurm-plugin and create a similar version but skipping the batch job submission, just directly run the job.sh with apptainer. Used for use cases where you spawn the plugin already as a job (one job, one virtual node)

Copilot AI changed the title [WIP] Add direct job execution with Apptainer without batch submission Implement interlink-apptainer-plugin: direct Apptainer execution without SLURM, with full e2e test suite May 6, 2026
Copilot AI requested a review from dciangot May 6, 2026 07:03
@dciangot dciangot marked this pull request as ready for review May 6, 2026 07:10
@dciangot

dciangot commented May 6, 2026

Copy link
Copy Markdown
Member

@copilot e2e action is failing

Copilot AI commented May 6, 2026

Copy link
Copy Markdown
Contributor Author

@copilot e2e action is failing

Fixed in be7fff5. The failing test was 090-probes.yaml — the HTTP probe always failed because executeHTTPProbe was called without the $timeout argument in both runProbe and runStartupProbe. The function's local timeout="$5" was therefore set to the container name string (e.g. "probe-test") instead of the numeric timeout, making timeout "probe-test" curl … fail with invalid time interval on every HTTP probe check, keeping the pod permanently in Waiting state.

Both callsites now pass the argument correctly:

executeHTTPProbe "${probe_args[@]}" "$timeout" "$container_name"

A regression test in probes_test.go covers this.

@dciangot dciangot merged commit 130c818 into main May 6, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants