Skip to content

The tutorial for the nohup-based method does not solve all issues #13

@mbaudin47

Description

@mbaudin47

In #12, the nohup method was documented. This, however, does not solve the problems in all our cases.

  • This makes the Python process run continuously on the frontend node of the cluster. This should be avoided in general, according to the managers of the cluster. This is because the frontend nodes are used by all the cluster's users. Therefore, we want to use the login node of the cluster as little as possible. This cannot be achieved using nohup.
  • On the cluster, the frontend nodes are rebooted on a periodic basis. If this happens in the middle of a simulation, then the process is lost. In this case, the nohup method does not help : the process is dead and there is no way to get it up again. In this case, if the second part of the main Python script is to save the computed data into a CSV file, then this part will not be done. Hence, the user has to get a method to gather the data without the help of the othpc module, which does not provided this service.

Both limitations do not occur when we use a submit-then-gather method.

  • Once the job is submitted, the process is finished. There is no process running on the frontend machine and the cluster's manager is as happy as possible.
  • The gather step can be done before or after the frontal is rebooted. Hence the user is not afraid of loosing the data after a reboot of the frondend. This is because the data is still available in the various sub-directories of the study. Hence, the gather() script can succeed without problem.

Hence, the tutorial should be extended or the API should be extended to match this specific need.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions