Skip to content

Add test cases for init container failures, mesh network opt-out, and wstunnel port-forward connectivity with CI setup#19

Open
Copilot wants to merge 5 commits into
mainfrom
copilot/add-test-for-init-container-failures
Open

Add test cases for init container failures, mesh network opt-out, and wstunnel port-forward connectivity with CI setup#19
Copilot wants to merge 5 commits into
mainfrom
copilot/add-test-for-init-container-failures

Conversation

Copilot AI commented Apr 24, 2026

Copy link
Copy Markdown
Contributor

Adds E2E coverage for three behaviors: failing init containers, per-pod mesh networking opt-out (interlink-hq/interLink#522), and wstunnel port-forward connectivity. Also adds a self-contained CI/local environment to run the mesh network test against a real k3s + Traefik + interLink stack.

New templates

  • 120-failing-init-container.yaml — pod with a failing init container; asserts pod reaches Failed and init container logs are accessible. Confirms the main container is never started.

  • 125-mesh-network-opt-out.yaml — pod annotated with interlink.eu/mesh-network: disabled; asserts the pod still completes successfully, confirming the opt-out path doesn't break execution.

  • 130-mesh-network.yaml — end-to-end wstunnel port-forward test. Creates three resources:

    1. ClusterIP Service selecting the server pod
    2. Server Pod on the VK with containerPort: 9999 declared (triggers interlink.eu/wstunnel-client-commands annotation setup). Port 9999 is used deliberately — port 8080 is wstunnel's own WebSocket control port and cannot be reused as a reverse-tunnel port on the same process.
    3. Client Pod (no VK nodeSelector) that retries connecting to the Service until it receives a PONG response

    Validates both pods reach terminal/running state and the client receives a response from the server, proving the full tunnel path is functional.

CI / local environment setup

  • scripts/setup-mesh-env.sh — fully self-contained setup script that:

    • Installs k3s with Traefik enabled (no --disable=traefik) so the Traefik IngressRoute CRD is available
    • Detects the Docker bridge gateway IP (default 172.17.0.1) and constructs WildcardDNS = <ip>.nip.io — Docker containers reach k3s Traefik via that address on port 80
    • Downloads the pre-built virtual-kubelet_Linux_x86_64 binary from the latest interLink release
    • Starts the interLink API and SLURM-plugin Docker containers
    • Configures the VK with Network.EnableTunnel: true, Network.WildcardDNS, and a custom WstunnelTemplatePath
  • scripts/wstunnel-traefik-template.yaml — custom wstunnel infrastructure template replacing the default nginx Ingress with a Traefik IngressRoute CRD. Routes Host({{.Name}}-{{.Namespace}}.{{.WildcardDNS}}), matching the hostname the VK encodes in the interlink.eu/wstunnel-client-commands annotation.

  • scripts/run-mesh-tests.sh — generates vktest_config_mesh.yaml, installs vk-test-set in a venv, and runs pytest.

  • scripts/cleanup-mesh-env.sh — stops the VK process, removes Docker containers, and uninstalls k3s.

  • .github/workflows/integration-tests.yaml — GitHub Actions workflow (path-filtered to vktestset/**, scripts/**, .github/workflows/**) that runs setup → test → cleanup with artifact upload on failure.

Notes

  • 125 and 130 together cover both branches of the mesh networking decision: opt-out (no tunnel) vs. active wstunnel path.
  • Client pod uses a bare Pod rather than a Job to stay compatible with the framework's exact-name pod lookup in CheckPod.execute.
  • The 125 template merges the cluster-provided annotations dict with the opt-out annotation using Jinja2's dict() + update() side-effect pattern (the do extension is unavailable in the plain Environment used by the test runner).

@dciangot

Copy link
Copy Markdown
Member

@copilot try to setup locally an environment with network mesh. use traefik native k3s ingress. for the wildcard dns, consider using docker host ip together with nip.io wildcard.

Copilot AI commented Apr 24, 2026

Copy link
Copy Markdown
Contributor Author

@copilot try to setup locally an environment with network mesh. use traefik native k3s ingress. for the wildcard dns, consider using docker host ip together with nip.io wildcard.

Done in commit 1332bd4. Here's what was set up:

scripts/setup-mesh-env.sh — self-contained local/CI setup:

  • Installs k3s with Traefik enabled (no --disable=traefik), so the Traefik IngressRoute CRD is available
  • Detects the Docker bridge gateway IP (default 172.17.0.1) and uses <ip>.nip.io as the WildcardDNS — containers in the Docker network can reach k3s Traefik on that address on port 80
  • Downloads the pre-built virtual-kubelet_Linux_x86_64 binary from the latest interLink release
  • Configures the VK with Network.EnableTunnel: true, Network.WildcardDNS: <dockerBridgeIP>.nip.io, and a custom WstunnelTemplatePath

scripts/wstunnel-traefik-template.yaml — custom wstunnel template that replaces the default nginx Ingress with a Traefik IngressRoute:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
spec:
  entryPoints: [web]
  routes:
    - match: Host(`{{.Name}}-{{.Namespace}}.{{.WildcardDNS}}`)

The hostname pattern matches exactly what the VK writes into the interlink.eu/wstunnel-client-commands annotation.

130-mesh-network.yaml — uses port 9999 (not 8080, which is wstunnel's own WebSocket control port).

.github/workflows/integration-tests.yaml — wires setup → test → cleanup with path filters on vktestset/**, scripts/**, and .github/workflows/**.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com/repos/interlink-hq/interLink/contents/plugins
    • Triggering command: /usr/bin/curl curl -s REDACTED (http block)
  • https://api.github.com/repos/interlink-hq/interLink/contents/plugins/slurm
    • Triggering command: /usr/bin/curl curl -s REDACTED (http block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI changed the title Add test cases for init container failures, mesh network opt-out, and wstunnel port-forward connectivity Add test cases for init container failures, mesh network opt-out, and wstunnel port-forward connectivity with CI setup Apr 24, 2026
Copilot AI requested a review from dciangot April 24, 2026 07:50
@dciangot dciangot marked this pull request as ready for review April 24, 2026 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants