Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions .github/scripts/generate-values-overrides.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
#!/usr/bin/env bash

# Copyright NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

set -euo pipefail

# Usage: generate-values-overrides.sh OUTPUT_FILE TOOLKIT_IMAGE DEVICE_PLUGIN_IMAGE MIG_MANAGER_IMAGE
#
# Generates a Helm values override file for GPU Operator component images.
# This file can be used with `helm install -f values-overrides.yaml` to
# override default component image versions.

if [[ $# -ne 4 ]]; then
echo "Usage: $0 OUTPUT_FILE TOOLKIT_IMAGE DEVICE_PLUGIN_IMAGE MIG_MANAGER_IMAGE" >&2
echo "" >&2
echo "Example:" >&2
echo " $0 values.yaml \\" >&2
echo " ghcr.io/nvidia/container-toolkit:v1.18.0-ubuntu20.04 \\" >&2
echo " ghcr.io/nvidia/k8s-device-plugin:v0.17.0-ubi8 \\" >&2
echo " ghcr.io/nvidia/k8s-mig-manager:v0.10.0-ubuntu20.04" >&2
exit 1
fi

OUTPUT_FILE="$1"
TOOLKIT_IMAGE="$2"
DEVICE_PLUGIN_IMAGE="$3"
MIG_MANAGER_IMAGE="$4"

# Generate values override file
cat > "${OUTPUT_FILE}" <<EOF
# Generated by generate-values-overrides.sh
# Date: $(date -u +"%Y-%m-%d %H:%M:%S UTC")
#
# This file overrides default GPU Operator component images with
# specific versions for forward compatibility testing.

toolkit:
repository: ""
version: ""
image: "${TOOLKIT_IMAGE}"

devicePlugin:
repository: ""
version: ""
image: "${DEVICE_PLUGIN_IMAGE}"

migManager:
repository: ""
version: ""
image: "${MIG_MANAGER_IMAGE}"
EOF

echo "Generated values override file: ${OUTPUT_FILE}"
echo ""
echo "=== Component Images ==="
echo "Container Toolkit: ${TOOLKIT_IMAGE}"
echo "Device Plugin: ${DEVICE_PLUGIN_IMAGE}"
echo "MIG Manager: ${MIG_MANAGER_IMAGE}"
echo ""
echo "=== File Contents ==="
cat "${OUTPUT_FILE}"

103 changes: 103 additions & 0 deletions .github/scripts/get-latest-images.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
#!/bin/bash
# Copyright NVIDIA CORPORATION
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

set -euo pipefail

COMPONENT=${1:-}

if [[ -z "${COMPONENT}" ]]; then
echo "Usage: $0 <toolkit|device-plugin|mig-manager>" >&2
exit 1
fi

# Verify regctl is available
if ! command -v regctl &> /dev/null; then
echo "Error: regctl not found. Please install regctl first." >&2
exit 1
fi

# Map component names to GHCR image repositories and GitHub source repositories
case "${COMPONENT}" in
toolkit)
IMAGE_REPO="ghcr.io/nvidia/container-toolkit"
GITHUB_REPO="NVIDIA/container-toolkit"
;;
device-plugin)
IMAGE_REPO="ghcr.io/nvidia/k8s-device-plugin"
GITHUB_REPO="NVIDIA/k8s-device-plugin"
;;
mig-manager)
IMAGE_REPO="ghcr.io/nvidia/k8s-mig-manager"
GITHUB_REPO="NVIDIA/k8s-mig-manager"
;;
*)
echo "Error: Unknown component '${COMPONENT}'" >&2
echo "Valid components: toolkit, device-plugin, mig-manager" >&2
exit 1
;;
esac

echo "Fetching latest commit from ${GITHUB_REPO}..." >&2

# Get the latest commit SHA from the main branch using GitHub API
GITHUB_API_URL="https://api.github.com/repos/${GITHUB_REPO}/commits/main"

# Use GITHUB_TOKEN if available for authentication (higher rate limits)
if [[ -n "${GITHUB_TOKEN:-}" ]]; then
LATEST_COMMIT=$(curl -sSL \
-H "Authorization: Bearer ${GITHUB_TOKEN}" \
-H "Accept: application/vnd.github.v3+json" \
"${GITHUB_API_URL}" | \
jq -r '.sha[0:8]')
Comment on lines +59 to +63
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script relies on jq to parse JSON responses from the GitHub API but doesn't verify that jq is installed before attempting to use it. If jq is not available, the script will fail with a cryptic error rather than a clear message.

Consider adding a check similar to the regctl verification on lines 26-29 to ensure jq is available before proceeding, providing a helpful error message if it's missing.

Copilot uses AI. Check for mistakes.
else
LATEST_COMMIT=$(curl -sSL \
-H "Accept: application/vnd.github.v3+json" \
"${GITHUB_API_URL}" | \
jq -r '.sha[0:8]')
fi

if [[ -z "${LATEST_COMMIT}" || "${LATEST_COMMIT}" == "null" ]]; then
echo "Error: Failed to fetch latest commit from ${GITHUB_REPO}" >&2
exit 1
fi

echo "Latest commit SHA: ${LATEST_COMMIT}" >&2

# Construct full image path with commit tag
FULL_IMAGE="${IMAGE_REPO}:${LATEST_COMMIT}"

echo "Verifying image exists: ${FULL_IMAGE}" >&2

# Verify the image exists using regctl with retry
MAX_RETRIES=5
RETRY_DELAY=30
for i in $(seq 1 ${MAX_RETRIES}); do
if regctl manifest head "${FULL_IMAGE}" &> /dev/null; then
echo "Verified ${COMPONENT} image: ${FULL_IMAGE}" >&2
echo "${FULL_IMAGE}"
exit 0
fi

if [[ $i -lt ${MAX_RETRIES} ]]; then
echo "Image not found (attempt $i/${MAX_RETRIES}), waiting ${RETRY_DELAY}s for CI to build..." >&2
sleep ${RETRY_DELAY}
# Exponential backoff: 30s, 60s, 120s, 240s
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exponential backoff comment on line 96 states the delays will be "30s, 60s, 120s, 240s", but with MAX_RETRIES=5, there will actually be 5 attempts with 4 waits between them. The actual delay sequence would be 30s, 60s, 120s, 240s (4 delays for 5 attempts). However, the logic is correct - the comment is just slightly misleading.

Consider updating the comment to clarify that these are the delays between attempts, or adjust to show the complete sequence including that the 5th attempt has no subsequent delay.

Suggested change
# Exponential backoff: 30s, 60s, 120s, 240s
# Exponential backoff between attempts: 30s, 60s, 120s, 240s (5 attempts, 4 waits)

Copilot uses AI. Check for mistakes.
RETRY_DELAY=$((RETRY_DELAY * 2))
fi
done
Comment on lines +84 to +99
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The forward compatibility workflow fetches the latest commit SHA from component repositories and immediately attempts to verify the corresponding image exists. However, there's a potential race condition: the commit may be very recent, and the CI pipeline in the component repository might not have finished building and publishing the image yet.

While the retry logic with exponential backoff (lines 84-99) helps mitigate this, the maximum wait time is approximately 7.5 minutes (30 + 60 + 120 + 240 seconds). For some repositories with slower CI pipelines, this might not be sufficient. Consider either increasing MAX_RETRIES, adjusting the backoff strategy, or adding a configurable delay before the first attempt to allow CI pipelines more time to complete.

Copilot uses AI. Check for mistakes.

echo "Error: Image ${FULL_IMAGE} does not exist after ${MAX_RETRIES} attempts" >&2
echo "The image may not have been built yet for commit ${LATEST_COMMIT}" >&2
exit 1
Loading
Loading