Use default values for resource requests#496
Merged
Conversation
having one node as a default and specify that in the spec. In the mean time, add one node as default for slurm.
…True`) and add a test (for slurm only at this time) to ensure that the corresponding parameter is not generated in the submit script.
Collaborator
Author
|
Merging for now because all tests pass. When @andre-merzky is back, we can discuss more (see #497) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Some scheduler configurations (e.g., Frontier@ORNL) require a node count. We do not mandate one, with the assumption that schedulers will have defaults and, if not, the (possibly misguided) spirit of PSI/J being a pass-through device which lets the scheduler decide how to handle a missing node count specification is maintained.
The problem exists when either a resource spec is missing and when the resource spec only specifies a process count (because the Slurm template does not use the computed counts).
This breaks our tests on Frontier. That, in itself, could be a statement that our tests are broken and should always specify a node count. More importantly, however, this breaks abstraction.
The point is that if we avoid defining defaults in PSI/J under the assumption that the scheduler will do the right thing with a missing value, that does not lead to uniform behavior, as evidenced above. Furthermore, the purpose of PSI/J is to clearly (and somewhat uniformly) define what a particular combination of values in the job spec, which, in this particular case, it fails to do. I would, therefore, argue that the meaning of a missing resource spec should be understood as "one process on one compute node".
The potential negative implication that I can think of is when some hypothetical scheduler might be configured to allocate fractional compute nodes when only a process count is specified, leading to an inability to specify such jobs on such schedulers, although when I have seen such scenarios, the scheduler tends to repurpose the notion of a node to mean the smallest fractional unit of a physical node.
This PR does two things: