Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .agents/skills/debug-openshell-cluster/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -290,7 +290,10 @@ If DNS is broken, all image pulls from the distribution registry will fail, as w
| `tls handshake eof` from `openshell status` | Server not running or mTLS credentials missing/mismatched | Check StatefulSet replicas (Step 3) and mTLS files (Step 6) |
| StatefulSet `0/0` replicas | StatefulSet scaled to zero (failed deploy, manual scale-down, or Helm misconfiguration) | `openshell doctor exec -- kubectl -n openshell scale statefulset openshell --replicas=1` |
| Local mTLS files missing | Deploy was interrupted before credentials were persisted | Extract from cluster secret `openshell-client-tls` (Step 6) |
| Container not found | Image not built | `mise run docker:build:cluster` (local) or re-deploy (remote) |
| Container not found | Image not built | `mise run docker:build:cluster` (local, with `OPENSHELL_RUNTIME_BUNDLE_TARBALL` set) or re-deploy (remote, with `--runtime-bundle-tarball`) |
| Local cluster image build now fails before Docker starts with runtime-bundle validation errors | Missing, malformed, wrong-arch, or unstaged `OPENSHELL_RUNTIME_BUNDLE_TARBALL` input for the controlled GPU runtime path | Re-run the cluster-image build with `OPENSHELL_RUNTIME_BUNDLE_TARBALL` pointing at a valid per-arch bundle tarball, and confirm `tasks/scripts/docker-build-cluster.sh` stages `deploy/docker/.build/runtime-bundle/<arch>/` successfully |
| Remote deploy now fails before Docker starts with runtime-bundle validation errors | `scripts/remote-deploy.sh` was run without `--runtime-bundle-tarball`, or the synced tarball path on the remote host is missing/invalid | Re-run `scripts/remote-deploy.sh` with `--runtime-bundle-tarball <local-tarball>` and confirm the tarball syncs to `${REMOTE_DIR}/.cache/runtime-bundles/` before the remote cluster build starts |
| Multi-arch cluster publish fails before Docker starts with missing runtime-bundle variables | One or both per-arch tarballs were not provided to `tasks/scripts/docker-publish-multiarch.sh` | Set `OPENSHELL_RUNTIME_BUNDLE_TARBALL_AMD64` and `OPENSHELL_RUNTIME_BUNDLE_TARBALL_ARM64` to valid per-arch tarballs, then re-run the multi-arch publish command |
| Container exited, OOMKilled | Insufficient memory | Increase host memory or reduce workload |
| Container exited, non-zero exit | k3s crash, port conflict, privilege issue | Check `openshell doctor logs` for details |
| `/readyz` fails | k3s still starting or crashed | Wait longer or check container logs for k3s errors |
Expand Down
7 changes: 7 additions & 0 deletions .github/workflows/branch-e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,17 @@ jobs:
component: cluster
platform: linux/arm64
runner: build-arm64
runtime-bundle-url: ${{ vars.OPENSHELL_RUNTIME_BUNDLE_URL_ARM64 }}
runtime-bundle-github-repo: ${{ github.repository_owner }}/nvidia-container-toolkit
runtime-bundle-release-tag: devel
runtime-bundle-filename-prefix: openshell-gpu-runtime-bundle
runtime-bundle-version: devel

e2e:
needs: [build-gateway, build-cluster]
uses: ./.github/workflows/e2e-test.yml
with:
image-tag: ${{ github.sha }}
runner: build-arm64
run-tool-smoke-validations: true
run-installer-selection-smoke: true
72 changes: 72 additions & 0 deletions .github/workflows/docker-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,41 @@ on:
required: false
type: string
default: ""
runtime-bundle-url:
description: "Per-arch runtime bundle tarball URL for single-arch cluster builds"
required: false
type: string
default: ""
runtime-bundle-url-amd64:
description: "amd64 runtime bundle tarball URL for multi-arch cluster builds"
required: false
type: string
default: ""
runtime-bundle-url-arm64:
description: "arm64 runtime bundle tarball URL for multi-arch cluster builds"
required: false
type: string
default: ""
runtime-bundle-github-repo:
description: "Runtime bundle producer GitHub repository"
required: false
type: string
default: ""
runtime-bundle-release-tag:
description: "Runtime bundle release tag used for derived defaults"
required: false
type: string
default: ""
runtime-bundle-filename-prefix:
description: "Runtime bundle asset filename prefix"
required: false
type: string
default: ""
runtime-bundle-version:
description: "Runtime bundle version token used in asset filenames"
required: false
type: string
default: ""

env:
MISE_GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Expand Down Expand Up @@ -87,7 +122,44 @@ jobs:
uses: ./.github/actions/setup-buildx

- name: Build ${{ inputs.component }} image
if: inputs.component != 'cluster'
env:
DOCKER_BUILDER: openshell
OPENSHELL_CARGO_VERSION: ${{ steps.version.outputs.cargo_version }}
run: mise run --no-prepare docker:build:${{ inputs.component }}

- name: Build cluster image
if: inputs.component == 'cluster'
env:
DOCKER_BUILDER: openshell
OPENSHELL_CARGO_VERSION: ${{ steps.version.outputs.cargo_version }}
OPENSHELL_RUNTIME_BUNDLE_URL: ${{ inputs.runtime-bundle-url }}
OPENSHELL_RUNTIME_BUNDLE_URL_AMD64: ${{ inputs.runtime-bundle-url-amd64 }}
OPENSHELL_RUNTIME_BUNDLE_URL_ARM64: ${{ inputs.runtime-bundle-url-arm64 }}
OPENSHELL_RUNTIME_BUNDLE_GITHUB_REPO: ${{ inputs.runtime-bundle-github-repo }}
OPENSHELL_RUNTIME_BUNDLE_RELEASE_TAG: ${{ inputs.runtime-bundle-release-tag }}
OPENSHELL_RUNTIME_BUNDLE_FILENAME_PREFIX: ${{ inputs.runtime-bundle-filename-prefix }}
OPENSHELL_RUNTIME_BUNDLE_VERSION: ${{ inputs.runtime-bundle-version }}
run: |
set -euo pipefail

if [[ "${DOCKER_PLATFORM}" == *","* ]]; then
bash tasks/scripts/ci-build-cluster-image.sh \
--platform "${DOCKER_PLATFORM}" \
--runtime-bundle-url-amd64 "${OPENSHELL_RUNTIME_BUNDLE_URL_AMD64}" \
--runtime-bundle-url-arm64 "${OPENSHELL_RUNTIME_BUNDLE_URL_ARM64}" \
--runtime-bundle-github-repo "${OPENSHELL_RUNTIME_BUNDLE_GITHUB_REPO}" \
--runtime-bundle-release-tag "${OPENSHELL_RUNTIME_BUNDLE_RELEASE_TAG}" \
--runtime-bundle-filename-prefix "${OPENSHELL_RUNTIME_BUNDLE_FILENAME_PREFIX}" \
--runtime-bundle-version "${OPENSHELL_RUNTIME_BUNDLE_VERSION}"
else
bash tasks/scripts/ci-build-cluster-image.sh \
--platform "${DOCKER_PLATFORM}" \
--runtime-bundle-url "${OPENSHELL_RUNTIME_BUNDLE_URL}" \
--runtime-bundle-url-amd64 "${OPENSHELL_RUNTIME_BUNDLE_URL_AMD64}" \
--runtime-bundle-url-arm64 "${OPENSHELL_RUNTIME_BUNDLE_URL_ARM64}" \
--runtime-bundle-github-repo "${OPENSHELL_RUNTIME_BUNDLE_GITHUB_REPO}" \
--runtime-bundle-release-tag "${OPENSHELL_RUNTIME_BUNDLE_RELEASE_TAG}" \
--runtime-bundle-filename-prefix "${OPENSHELL_RUNTIME_BUNDLE_FILENAME_PREFIX}" \
--runtime-bundle-version "${OPENSHELL_RUNTIME_BUNDLE_VERSION}"
fi
54 changes: 52 additions & 2 deletions .github/workflows/e2e-test.yml
Original file line number Diff line number Diff line change
@@ -1,17 +1,48 @@
name: E2E Test

on:
workflow_dispatch:
inputs:
image-tag:
description: "Image tag to test (typically the commit SHA)"
required: true
type: string
runner:
description: "GitHub Actions runner label for the core E2E suite and optional smoke slices"
required: false
type: string
default: "build-amd64"
run-tool-smoke-validations:
description: "Add the first-class tool smoke evidence slice after the core E2E suite"
required: false
type: boolean
default: false
run-installer-selection-smoke:
description: "Add the installer selection smoke slice after the core E2E suite"
required: false
type: boolean
default: false
workflow_call:
inputs:
image-tag:
description: "Image tag to test (typically the commit SHA)"
required: true
type: string
runner:
description: "GitHub Actions runner label"
description: "GitHub Actions runner label for the core E2E suite and optional smoke slices"
required: false
type: string
default: "build-amd64"
run-tool-smoke-validations:
description: "Add the first-class tool smoke evidence slice after the core E2E suite"
required: false
type: boolean
default: false
run-installer-selection-smoke:
description: "Add the installer selection smoke slice after the core E2E suite"
required: false
type: boolean
default: false

permissions:
contents: read
Expand Down Expand Up @@ -62,7 +93,26 @@ jobs:
- name: Install SSH client for Rust CLI e2e tests
run: apt-get update && apt-get install -y --no-install-recommends openssh-client && rm -rf /var/lib/apt/lists/*

- name: Run E2E tests
- name: Run core E2E suite
run: |
mise run --no-prepare --skip-deps e2e:python
mise run --no-prepare --skip-deps e2e:rust

- name: Record tool smoke evidence slice
if: ${{ inputs.run-tool-smoke-validations }}
run: |
printf 'Enabled first-class tool smoke evidence slice for image-tag=%s\n' "${IMAGE_TAG}"
{
printf '## Tool Smoke Evidence Slice\n\n'
printf -- '- Trigger: `run-tool-smoke-validations=true`\n'
printf -- '- Image tag: `%s`\n' "${IMAGE_TAG}"
printf -- '- Contract: run `tool_adapter_smoke` after the core E2E suite\n'
} >> "$GITHUB_STEP_SUMMARY"

- name: Run first-class tool smoke evidence slice
if: ${{ inputs.run-tool-smoke-validations }}
run: cargo test --manifest-path e2e/rust/Cargo.toml --features e2e --test tool_adapter_smoke -- --nocapture

- name: Run installer selection smoke slice
if: ${{ inputs.run-installer-selection-smoke }}
run: bash e2e/install/bash_test.sh
4 changes: 2 additions & 2 deletions .github/workflows/release-canary.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ jobs:
- name: Install CLI (default / latest)
run: |
set -euo pipefail
curl -LsSf https://raw.githubusercontent.com/NVIDIA/OpenShell/main/install.sh | sh
curl -LsSf https://raw.githubusercontent.com/linuxdevel/OpenShell/main/install.sh | sh

- name: Verify CLI installation
run: |
Expand Down Expand Up @@ -132,7 +132,7 @@ jobs:
- name: Install CLI from published install script
run: |
set -euo pipefail
curl -LsSf https://raw.githubusercontent.com/NVIDIA/OpenShell/main/install.sh | OPENSHELL_VERSION=${{ steps.release.outputs.tag }} OPENSHELL_INSTALL_DIR=/usr/local/bin sh
curl -LsSf https://raw.githubusercontent.com/linuxdevel/OpenShell/main/install.sh | OPENSHELL_VERSION=${{ steps.release.outputs.tag }} OPENSHELL_INSTALL_DIR=/usr/local/bin sh

- name: Verify CLI installation
run: |
Expand Down
23 changes: 18 additions & 5 deletions .github/workflows/release-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,12 @@ jobs:
with:
component: cluster
cargo-version: ${{ needs.compute-versions.outputs.cargo_version }}
runtime-bundle-url-amd64: ${{ vars.OPENSHELL_RUNTIME_BUNDLE_URL_AMD64 }}
runtime-bundle-url-arm64: ${{ vars.OPENSHELL_RUNTIME_BUNDLE_URL_ARM64 }}
runtime-bundle-github-repo: ${{ github.repository_owner }}/nvidia-container-toolkit
runtime-bundle-release-tag: devel
runtime-bundle-filename-prefix: openshell-gpu-runtime-bundle
runtime-bundle-version: devel

e2e:
needs: [build-gateway, build-cluster]
Expand Down Expand Up @@ -327,6 +333,12 @@ jobs:
sha256sum *.tar.gz *.whl > openshell-checksums-sha256.txt
cat openshell-checksums-sha256.txt

- name: Skip detached checksum signing for devel release
run: |
set -euo pipefail
echo "Devel releases publish checksum manifests without detached signatures in the active path."
echo "Detached checksum signing is deferred backlog work for both devel and tagged release workflows."

- name: Prune stale wheel assets from devel release
uses: actions/github-script@v7
env:
Expand Down Expand Up @@ -392,6 +404,7 @@ jobs:
This build is automatically built on every commit to main that passes CI.

> **NOTE**: This is a development build, not a tagged release, and may be unstable.
> **NOTE**: Checksum manifests are published in the active path. Detached checksum signing is deferred backlog work.

### Quick install

Expand All @@ -405,7 +418,7 @@ jobs:
Darwin-arm64) ASSET="openshell-aarch64-apple-darwin.tar.gz" ;; \
*) echo "Unsupported platform: ${OS}-${ARCH}" >&2; exit 1 ;; \
esac; \
gh release download devel --repo NVIDIA/OpenShell --pattern "${ASSET}" -O - \
gh release download devel --repo linuxdevel/OpenShell --pattern "${ASSET}" -O - \
| tar xz \
&& sudo install -m 755 openshell /usr/local/bin/openshell'
```
Expand All @@ -414,10 +427,10 @@ jobs:

| File | Platform | Install |
|------|----------|---------|
| `openshell-x86_64-unknown-linux-musl.tar.gz` | Linux x86_64 | `gh release download devel --repo NVIDIA/OpenShell --pattern "openshell-x86_64-unknown-linux-musl.tar.gz" -O - \| tar xz && sudo install -m 755 openshell /usr/local/bin/openshell` |
| `openshell-aarch64-unknown-linux-musl.tar.gz` | Linux aarch64 / ARM64 | `gh release download devel --repo NVIDIA/OpenShell --pattern "openshell-aarch64-unknown-linux-musl.tar.gz" -O - \| tar xz && sudo install -m 755 openshell /usr/local/bin/openshell` |
| `openshell-aarch64-apple-darwin.tar.gz` | macOS Apple Silicon | `gh release download devel --repo NVIDIA/OpenShell --pattern "openshell-aarch64-apple-darwin.tar.gz" -O - \| tar xz && sudo install -m 755 openshell /usr/local/bin/openshell` |
| `openshell-*.whl` | Python wheels | `gh release download devel --repo NVIDIA/OpenShell --pattern "openshell-*.whl"` |
| `openshell-x86_64-unknown-linux-musl.tar.gz` | Linux x86_64 | `gh release download devel --repo linuxdevel/OpenShell --pattern "openshell-x86_64-unknown-linux-musl.tar.gz" -O - \| tar xz && sudo install -m 755 openshell /usr/local/bin/openshell` |
| `openshell-aarch64-unknown-linux-musl.tar.gz` | Linux aarch64 / ARM64 | `gh release download devel --repo linuxdevel/OpenShell --pattern "openshell-aarch64-unknown-linux-musl.tar.gz" -O - \| tar xz && sudo install -m 755 openshell /usr/local/bin/openshell` |
| `openshell-aarch64-apple-darwin.tar.gz` | macOS Apple Silicon | `gh release download devel --repo linuxdevel/OpenShell --pattern "openshell-aarch64-apple-darwin.tar.gz" -O - \| tar xz && sudo install -m 755 openshell /usr/local/bin/openshell` |
| `openshell-*.whl` | Python wheels | `gh release download devel --repo linuxdevel/OpenShell --pattern "openshell-*.whl"` |
| `openshell-checksums-sha256.txt` | — | SHA256 checksums for all archives |
files: |
release/openshell-x86_64-unknown-linux-musl.tar.gz
Expand Down
20 changes: 17 additions & 3 deletions .github/workflows/release-tag.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,12 @@ jobs:
with:
component: cluster
cargo-version: ${{ needs.compute-versions.outputs.cargo_version }}
runtime-bundle-url-amd64: ${{ vars.OPENSHELL_RUNTIME_BUNDLE_URL_AMD64 }}
runtime-bundle-url-arm64: ${{ vars.OPENSHELL_RUNTIME_BUNDLE_URL_ARM64 }}
runtime-bundle-github-repo: ${{ github.repository_owner }}/nvidia-container-toolkit
runtime-bundle-release-tag: ${{ inputs.tag || github.ref_name }}
runtime-bundle-filename-prefix: openshell-gpu-runtime-bundle
runtime-bundle-version: ${{ needs.compute-versions.outputs.semver }}

e2e:
needs: [build-gateway, build-cluster]
Expand Down Expand Up @@ -345,13 +351,19 @@ jobs:
name: python-wheels
path: release/

- name: Generate checksums
- name: Generate required checksum manifest
run: |
set -euo pipefail
cd release
sha256sum *.tar.gz *.whl > openshell-checksums-sha256.txt
cat openshell-checksums-sha256.txt

- name: Note deferred detached checksum signing
run: |
set -euo pipefail
echo "Tagged releases require release/openshell-checksums-sha256.txt in the active path."
echo "Detached checksum signing remains deferred backlog work and is not enforced by this workflow yet."

- name: Create GitHub Release
uses: softprops/action-gh-release@v2
with:
Expand All @@ -360,12 +372,14 @@ jobs:
tag_name: ${{ env.RELEASE_TAG }}
generate_release_notes: true
body: |
## OpenShell ${{ env.RELEASE_TAG }}
## OpenShell ${{ env.RELEASE_TAG }}

Checksum manifest generation is required for tagged releases. Detached checksum signing remains deferred backlog work and is not enforced in the active release path.

### Quick install

```bash
curl -LsSf https://raw.githubusercontent.com/NVIDIA/OpenShell/main/install.sh | OPENSHELL_VERSION=${{ env.RELEASE_TAG }} sh
curl -LsSf https://raw.githubusercontent.com/linuxdevel/OpenShell/main/install.sh | OPENSHELL_VERSION=${{ env.RELEASE_TAG }} sh
```

files: |
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ OpenShell is built agent-first. The project ships with agent skills for everythi
**Binary (recommended):**

```bash
curl -LsSf https://raw.githubusercontent.com/NVIDIA/OpenShell/main/install.sh | sh
curl -LsSf https://raw.githubusercontent.com/linuxdevel/OpenShell/main/install.sh | sh
```

**From PyPI (requires [uv](https://docs.astral.sh/uv/)):**
Expand Down
Loading
Loading