Dear All,
I have access to a HPC cluster managed with SLURM, with 32-80-144 cpus and up to 1T RAM nodes. When I launch a script to execute at most 5 codes, with plenty of parallelization inside, simultaneously, according to the 5 nodes floating license, these jobs are not dispatched on 5 different nodes but are crammed on the same one or on two nodes, even though there are other nodes idle at the same moment.
The question is then, whether it is possible to bind one individual job to one individual node in view of calibrating the parallel threads in the Gauss codes to the number of processors on that specific node and use efficiently the computing resources. As a matter of fact, without binding one individual job to an individual node, the execution of more than one job on the same node slows down significantly, leading to an inefficient use of the computing capability of the node and of the cluster itself.
Thanks in advance for your help in these matters.
1 Answer
0
If you are trying to launch 5 different GAUSS sessions at one time, but they are all being launched on the same node, you will have to ask your IT how to tell SLURM to send each GAUSS job to a different node.
The parallelization within each GAUSS job uses a shared memory model, so I don't think a particular job can be spread across several nodes. Though, it would be worth asking your IT if you can spread a job with shared memory parallelization across multiple nodes.
Your Answer
1 Answer
If you are trying to launch 5 different GAUSS sessions at one time, but they are all being launched on the same node, you will have to ask your IT how to tell SLURM to send each GAUSS job to a different node.
The parallelization within each GAUSS job uses a shared memory model, so I don't think a particular job can be spread across several nodes. Though, it would be worth asking your IT if you can spread a job with shared memory parallelization across multiple nodes.
