Add executor for NEC’s job scheduler (NQSV)#522
Conversation
|
Thank you again @tat-ohmura. This seems like a pretty straightforward addition. We will review the code in the following days. I have a couple of general comments:
|
|
Thank you for your prompt reply. We are considering using PSI/J as a workflow framework for Urgent Computing, where time constraints are particularly strict. Since we want to detect job completion as quickly as possible, but shortening the polling interval would put a heavy load on the job scheduler, we have decided to use the qwait command. We would like to explore the possibility of preparing an NQSV machine. If there are any documents outlining the steps for setting up a machine in the https://testing.psij.io/ environment, could you please share them with us? |
That is fine. I am glad that a good solution exists for NQSV.
This should get you started: https://github.com/ExaWorks/psij-python/blob/main/README-testing.md Since this is a new scheduler, we may want to add some code around Line 447 in f8b54ff qsub -v | grep xyz) that can be used to tell if NQSV is installed.
If you run into difficulties, it may help to chat on Slack. You can use the link at the top-right on https://psij.io, or I can send you an invite if that doesn't work well. |
hategan
left a comment
There was a problem hiding this comment.
This looks good.
Codespell flagged a misspelled word in a comment, so I won't merge now if you want to fix that, but it's not important.
This pull request adds an executor that supports NQSV (https://www.nec.com/en/global/solutions/hpc/articles/tech08.html). This is a proprietary job scheduler developed by NEC and is used in multiple large-scale HPC systems in Japan.
To wait for the completion of a job, PSI/J currently polls the status of the job. Since NQSV offers a dedicated command (qwait ) to wait for a job, we have implemented job completion detection using qwait instead of polling.
We hope to discuss and refine the implementation based on your feedback. Once the PR is reviewed and approved, we will update the documentation accordingly.