Next iteration for current setup.

parent 8d0f64c5
......@@ -16,7 +16,7 @@ Slurm Support for Heterogeneous Jobs
------------------------------------
For detailed information about Slurm, please take a look on the :ref:`Quick Introduction <quickintro>` and :ref:`Batch system <batchsystem>` page.
With Slurm 17.11 support for Heterogeneous Jobs was introduced. This allows to spawn a job across multiple partitions of a cluster, and across different Modules of our Supercomputers. See the official Slurm documentation (SlurmHetJob_) for additional informations on this feature.
With Slurm 17.11 support for Heterogeneous Jobs was introduced. This allows to spawn a job across multiple partitions of a cluster, and across different Modules of our Supercomputers. See the official Slurm documentation (SlurmHetJob_) for additional information on this feature.
.. _SlurmHetJob: https://slurm.schedmd.com/heterogeneous_jobs.html
......@@ -73,16 +73,15 @@ MPI Traffic Across Modules
--------------------------
When the nodes of a job belong to different interconnects and MPI communication is used, bridging has to take place. To support this workflow, e.g. run a job on a Cluster with Infiniband and a Booster with OmniPath, a Gateway Daemon (psgwd, ParaStation Gateway Daemon) was implemented that takes care of moving packages across fabrics.
To request gateway nodes for a job the mandatory option --gw_num has to be specified. In addition, communication with the psgwd has to be ensured via loading the software module **pscom-gateway** either via ``xenv`` or the ``module`` command.
To request gateway nodes for a job the mandatory option --gw_num has to be specified at submit/allocation time. In addition, communication with the psgwd has to be ensured via loading the software module **pscom-gateway** either via ``xenv`` or the ``module`` command.
To start an interactive pack job using two gateway nodes the following command can be used:
**April 2019** For the time being, prefixing binaries via ``msa_fix_ld`` is necessary, due to a libmpi version that does not support the psgwd. We hope this will go away soon.
.. code-block:: none
To start an interactive pack job using two gateway nodes the following command must be used:
export PSP_GATEWAY=2
srun -A <budget account> -p <cluster, ...> --gw_num=2 xenv -L pscom-gateway ./prog1 : -p <booster, ...> xenv -L pscom-gateway ./prog2
.. code-block:: none
Where ``PSP_GATEWAY=2`` ensures that the gateway protocol is used, **not** that two gateways are used!
srun -A <budget account> -p <batch, ...> --gw_num=2 xenv -L pscom-gateway msa_fix_ld ./prog1 : -p <booster, ...> xenv -L pscom-gateway msa_fix_ld ./prog2
When submitting a job that will run later, you have to specify the number of gateways at submit time:
......@@ -90,6 +89,9 @@ When submitting a job that will run later, you have to specify the number of gat
sbatch --gw_num=2 ./submit-scribt.sbatch
Debugging
~~~~~~~~~
For debugging purposes, and to make sure the gateways are used, you might use
.. code-block:: none
......@@ -103,6 +105,15 @@ You should see output like
<PSP:r0000003:CONNECT (192.168.12.34,26708,0x2,r0000003) to (192.168.12.41,29538,0x2,r0000004) via gw>
<PSP:r0000004:ACCEPT (192.168.12.34,26708,0x2,r0000003) to (192.168.12.41,29538,0x2,r0000004) via gw>
JuRoPA3
~~~~~~~
Because JUROPA3 has only one high-speed interconnect, using the ``psgwd`` is only possible using ``PSP_GATEWAY=2``. Via exporting this variable the Gateway protocols priority is boosted over the default interconnect.
.. code-block:: none
export PSP_GATEWAY=2
srun -A <budget account> -p <cluster, ...> --gw_num=2 xenv -L pscom-gateway ./prog1 : -p <booster, ...> xenv -L pscom-gateway ./prog2
PSGWD
~~~~~
The psgw plugin for the ParaStation management daemon extends the Slurm commands salloc, srun and sbatch with the following options:
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment