Reorganizing/adding some psgwd blocks.

parent a3baed3e
......@@ -73,7 +73,7 @@ MPI Traffic Across Modules
--------------------------
When the nodes of a job belong to different interconnects and MPI communication is used, bridging has to take place. To support this workflow, e.g. run a job on a Cluster with Infiniband and a Booster with OmniPath, a Gateway Daemon (psgwd, ParaStation Gateway Daemon) was implemented that takes care of moving packages across fabrics.
To request gateway nodes for a job, the mandatory option --gw_num has to be specified at submit/allocation time. In addition, communication with the psgwd has to be ensured via loading the software module **pscom-gateway** either via ``xenv`` or the ``module`` command.
To request gateway nodes for a job, the mandatory option ``gw_num`` has to be specified at submit/allocation time. In addition, communication with the psgwd has to be ensured via loading the software module **pscom-gateway** either via ``xenv`` or the ``module`` command.
There are in total 198 Gateways available.
......@@ -114,6 +114,10 @@ For the time being, prefixing binaries via ``msa_fix_ld`` is necessary. This is
PSGWD
~~~~~
PSGWD Slurm Extension
+++++++++++++++++++++
The psgw plugin for the ParaStation management daemon extends the Slurm commands salloc, srun and sbatch with the following options:
.. code-block:: none
......@@ -122,23 +126,27 @@ The psgw plugin for the ParaStation management daemon extends the Slurm commands
--gw_plugin=string Name of the route plugin
--gw_num=number Number of gateway nodes
A routing file will be generated in $HOME/psgw-route-$JOBID. The routing file is
automatically removed when the allocation is revoked. With the option --gw_file an
alternative location using an absolute path for the routing file can be specified:
A routing file will be generated in $HOME/psgw-route-$JOBID. The routing file is automatically removed when the allocation is revoked.
PSGWD Routing Plugins
+++++++++++++++++++++
With the option ``gw_file`` an alternative location using an absolute path for the routing file can be specified:
.. code-block:: none
srun --gw_file=/home-fs/rauh/route-file --gw_num=2 -N 1 hostname : -N 2 hostname
The route plugin can be changed using the --gw_plugin option. Currently only the
default plugin “plugin01” is available.
The route plugin can be changed using the ``gw_plugin`` option. Currently only the default plugin ``plugin01`` is available.
.. code-block:: none
srun --gw_plugin=plugin01 --gw_num=2 -N 1 hostname : -N 2 hostname
If more gateways were requested than available the slurmctld prologue will fail for
interactive jobs
PSGWD Gateway Assignment
++++++++++++++++++++++++
If more gateways were requested than available the slurmctld prologue will fail for a interactive jobs
.. code-block:: none
......@@ -150,8 +158,7 @@ interactive jobs
srun: Force Terminated job 158553
srun: error: Job allocation 158553 has been revoked
If batch jobs run out of gateway resources they will be re-queued and have to wait for
10 minutes before becoming eligible to start again.
If batch jobs run out of gateway resources they will be re-queued and have to wait for 10 minutes before becoming eligible to start again.
Debugging
~~~~~~~~~
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment