Skip to content

fix(bootstrap): surface Helm install failure on namespace timeout (#211)#460

Closed
Manoj-engineer wants to merge 1 commit intoNVIDIA:mainfrom
Manoj-engineer:fix/helm-error-diagnosis-211
Closed

fix(bootstrap): surface Helm install failure on namespace timeout (#211)#460
Manoj-engineer wants to merge 1 commit intoNVIDIA:mainfrom
Manoj-engineer:fix/helm-error-diagnosis-211

Conversation

@Manoj-engineer
Copy link

Summary

When gateway start times out waiting for the openshell namespace, the error
message now checks for failed helm-install-* jobs in kube-system and surfaces
the actual Helm error and last 30 log lines instead of the generic "namespace not ready" message.

Related Issue

Fixes #211

Changes

  • Add diagnose_helm_failure() in openshell-bootstrap/src/lib.rs that queries
    helm-install-* jobs in kube-system for failed pods and returns job conditions
    • last 30 log lines
  • Wire into wait_for_namespace() final timeout branch
  • Fix awk filter: status.failed stays <none> during backoff retry window;
    filter on != "0" instead of != "<none>" && != "0" to catch actively-failing jobs
  • Add unit test helm_failure_hint_is_included_in_namespace_timeout_message

Testing

  • mise run pre-commit passes
  • Unit tests added/updated — 78/78 pass
  • E2E tests added/updated (not applicable — error path only)
  • Live-tested end-to-end: built a gateway image with corrupted serviceaccount.yaml,
    confirmed the Helm error appears in the terminal output on timeout

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (not applicable)

…IDIA#211)

Signed-off-by: Manoj-engineer <194872717+Manoj-engineer@users.noreply.github.com>
@github-actions
Copy link

Thank you for your interest in contributing to OpenShell, @Manoj-engineer.

This project uses a vouch system for first-time contributors. Before submitting a pull request, you need to be vouched by a maintainer.

To get vouched:

  1. Open a Vouch Request discussion.
  2. Describe what you want to change and why.
  3. Write in your own words — do not have an AI generate the request.
  4. A maintainer will comment /vouch if approved.
  5. Once vouched, open a new PR (preferred) or reopen this one after a few minutes.

See CONTRIBUTING.md for details.

@github-actions
Copy link

Thank you for your submission! We ask that you sign our Developer Certificate of Origin before we can accept your contribution. You can sign the DCO by adding a comment below using this text:


I have read the DCO document and I hereby sign the DCO.


You can retrigger this bot by commenting recheck in this Pull Request. Posted by the DCO Assistant Lite bot.

@github-actions github-actions bot closed this Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve error message when Helm chart has malformed YAML

1 participant