Latest state from User Documentation

parent e97e4832
......@@ -12,33 +12,107 @@ Overview
.. _het_modular_jobs_slurm:
Slurm Support for Heterogeneous Jobs
------------------------------------
For detailed information about Slurm, please take a look on the :ref:`Quick Introduction <quickintro>` and :ref:`Batch system <batchsystem>` page.
Heterogeneous Jobs
------------------
With Slurm 17.11 support for Heterogeneous Jobs was introduced. This allows to spawn a job across multiple partitions of a cluster, and across different Modules of our Supercomputers. See the official Slurm documentation (SlurmHetJob_) for additional information on this feature.
With Slurm 17.11 support for heterogeneous jobs was introduced.
A heterogeneous job consists of several job components, all of which can have individual job options.
In particular, the different components can request nodes from different partitions.
That way, a heterogeneous job can for example be spawned across multiple modules of our supercomputers.
.. _SlurmHetJob: https://slurm.schedmd.com/heterogeneous_jobs.html
Specifying Individual Job Options
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
**salloc/srun**
The syntax of the interactive and non-interactive submission mechanisms -- ``salloc`` and ``srun`` -- has been extended to the user to specify individual options for the different job components.
For ``srun``, the sequence of command line is partitioned into several blocks with the colon ``:`` acting as the seperator.
The resulting heterogeneous job will have as many job components as there were blocks of command line arguments.
The first block of arguments contains the job options of the first job component as well as common job options that will apply to all other components.
The second block contains options for the second job component and so on.
The abstract syntax is as follows:
.. code-block:: none
salloc -A <budget account> -p <batch, ...> : -p <booster, ...> [ : -p <booster, ...> ]
srun ./prog1 : ./prog2 [ : ./progN ]
$ salloc <options 0 + common> : <options 1> [ : <options 2>... ]
**sbatch**
The following invocation of ``salloc`` submits an interactive heterogeneous job that consists of two components, the first requesting one node from the ``mem512`` partition, the second requesting 16 nodes from the ``batch`` partition.
.. code-block:: none
$ salloc -A budget -p mem512 -N 1 : -p batch -N 16
Submitting non-interactive heterogeneous jobs through ``sbatch`` works similarly, but the syntax for seperating blocks of options in a batch script is slightly different.
Instead of the colon ``:``, batch scripts use the usual directive ``#SBATCH`` followed by the word ``packjob`` as a separator:
.. code-block:: none
#!/bin/bash
#SBATCH -A <budget account>
#SBATCH -p <batch, ...>
#SBATCH <options 0 + common>
#SBATCH packjob
#SBATCH -p <booster, ...>
#SBATCH <options 1>
[
#SBATCH packjob
#SBATCH <options 2>...
]
To submit a non-interactive heterogeneous job with the same setup as the interactive job above, the jobscript would read
.. code-block:: none
#!/bin/bash
#SBATCH -A budget -p mem512 -N 1
#SBATCH packjob
#SBATCH -p batch -N 16
...
As always, one can also specify job options on the ``sbatch`` command line and even mix options specified on the command line and in the batch script.
Again, the colon ``:`` acts as the seperator of blocks of command line arguments.
For example to specify that particular job components should always run on certain partitions they could be specified in the job script, while the number of nodes is left to be specified on the command line.
The following batch script, submitted via ``sbatch -N 1 : -N 16 <batch script>`` results in the same heterogeneous job as the previous two examples.
.. code-block:: none
#!/bin/bash
#SBATCH -A budget -p mem512
#SBATCH packjob
#SBATCH -p batch
...
Running Job Components Side by Side
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
As with homogeneous jobs, applications are launched inside a heterogeneous job using ``srun``.
Like ``salloc`` and ``sbatch``, ``srun`` can be used to specify different options and also commands to run for different components through blocks of command line arguments separated by the colon ``:``.
.. code-block:: none
$ srun <options and command 0> : <options and command 1> [ : <options and command 2>... ]
For example, in a heterogeneous job with two components, ``srun`` accepts up to two blocks of arguments and commands:
.. code-block:: none
$ srun --ntasks-per-node 24 ./prog1 : --ntasks-per-node 1 ./prog2
The first block applies to the first component, the second block to the second component and so on.
If there are less blocks than job components, the resources of the latter job components go unused as no application is launched there.
The option ``--pack-group=<expr>`` can be used to explicitly assign a block of command line arguments to a job component.
It takes as its argument ``<expr>`` either a single job component index in the range ``0 ... n - 1`` where ``n`` is the number of job components, or a range of indices like ``1-3`` or a comma seperated list of both indices and ranges like ``1,3-5``.
The following invocation of ``srun`` runs the same application ``./prog`` in components ``0`` and ``2`` of a three component heterogeneous job, leaving component ``1`` idle:
.. code-block:: none
$ srun --pack-group=0,2 ./prog
The same application ``./prog`` can be run in all three job components using:
.. code-block:: none
$ srun --pack-group=0-2 ./prog
srun ./prog1 : ./prog2
For detailed information about Slurm, please take a look on the :ref:`Quick Introduction <quickintro>` and :ref:`Batch system <batchsystem>` page as well as the official `Slurm documentation on heterogeneous jobs`_ for additional information on this feature.
.. _Slurm documentation on heterogeneous jobs: https://slurm.schedmd.com/heterogeneous_jobs.html
.. _het_modular_jobs_software:
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment