Skip to content

Conversation

@calvinp0
Copy link
Member

@calvinp0 calvinp0 commented Jan 7, 2026

Due to the changes in Zeus, unless the user is submitting arc via n170 host, they will not be informed there is a job submission error occurring and ARC will continuously say it cannot find the output files (even though it technically was never able to submit the ESS jobs). This PR will therefore error ARC if it receives such a message about job submission issues.

Due to the changes in Zeus, unless the user is submitting arc via n170 host, they will not be informed there is a job submission error occurring and ARC will continuously say it cannot find the output files (even though it technically was never able to submit the ESS jobs). This PR will therefore error ARC if it receives such a message about job submission issues.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds error detection for PBS job submissions attempted from compute nodes on the Zeus cluster. Previously, when jobs were submitted from a compute node instead of the login server, ARC would silently fail and report missing output files without clearly indicating the underlying submission error.

Key Changes:

  • Added error detection in submit_job() to recognize Zeus PBS compute node submission errors
  • Added comprehensive test coverage for the new error handling

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
arc/job/local.py Added error handling for PBS submissions from compute nodes, raising a clear ValueError when the specific error messages are detected
arc/job/local_test.py Added unit test to verify the compute node error detection works correctly, including proper mocking of the retry behavior

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@JintaoWu98 JintaoWu98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks!

@calvinp0 calvinp0 merged commit cfebdfb into main Jan 8, 2026
12 checks passed
@calvinp0 calvinp0 deleted the submit_host_error branch January 8, 2026 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants