Update -L pscom-gateway/msa_fix_ld things, try to make it more understandable...

Update -L pscom-gateway/msa_fix_ld things, try to make it more understandable while those ugly workarounds are still... not that easy.
parent 367af959
...@@ -73,17 +73,42 @@ MPI Traffic Across Modules ...@@ -73,17 +73,42 @@ MPI Traffic Across Modules
-------------------------- --------------------------
When the nodes of a job belong to different interconnects and MPI communication is used, bridging has to take place. To support this workflow, e.g. run a job on a Cluster with Infiniband and a Booster with OmniPath, a Gateway Daemon (psgwd, ParaStation Gateway Daemon) was implemented that takes care of moving packages across fabrics. When the nodes of a job belong to different interconnects and MPI communication is used, bridging has to take place. To support this workflow, e.g. run a job on a Cluster with Infiniband and a Booster with OmniPath, a Gateway Daemon (psgwd, ParaStation Gateway Daemon) was implemented that takes care of moving packages across fabrics.
To request gateway nodes for a job, the mandatory option ``gw_num`` has to be specified at submit/allocation time. In addition, communication with the psgwd has to be ensured via loading the software module **pscom-gateway** either via ``xenv`` or the ``module`` command. Loading MPI
~~~~~~~~~~~
**JURECA Cluster**
Communication with the psgwd has to be ensured via loading the software module **pscom-gateway** either via ``xenv`` or the ``module`` command.
**JURECA Booster, Current MPI Workaround (April/May/... 2019)**
For the time being, prefixing JURECA **Booster** binaries via ``msa_fix_ld`` is necessary. This is due to the fact that the installed libmpi version does not support the psgwd. We hope this will go away soon.
``msa_fix_ld`` is modifying the environment, so it might influence the modules you load.
.. code-block:: none
#!/bin/bash
export PSP_PSM=1
export LD_LIBRARY_PATH="/usr/local/jsc/msa_parastation_mpi/lib:/usr/local/jsc/msa_parastation_mpi/lib/mpi-hpl-gcc/:${LD_LIBRARY_PATH}"
$*
Requesting Gateways
~~~~~~~~~~~~~~~~~~~
To request gateway nodes for a job, the mandatory option ``gw_num`` has to be specified at submit/allocation time.
- There are in total 198 Gateways available. - There are in total 198 Gateways available.
- The Gateways are exclusive resources, they are not shared across user jobs. This may change in the future. - The Gateways are exclusive resources, they are not shared across user jobs. This may change in the future.
- There is currently no enforced maximum on the number of Gateways per job, beside of the total number of Gateways. This may change in the future. - There is currently no enforced maximum on the number of Gateways per job, beside of the total number of Gateways. This may change in the future.
Submitting Jobs
~~~~~~~~~~~~~~~
To start an interactive pack job using two gateway nodes the following command must be used: To start an interactive pack job using two gateway nodes the following command must be used:
.. code-block:: none .. code-block:: none
srun -A <budget account> -p <batch, ...> --gw_num=2 xenv -L pscom-gateway msa_fix_ld ./prog1 : -p <booster, ...> xenv -L pscom-gateway msa_fix_ld ./prog2 srun -A <budget account> -p <batch, ...> --gw_num=2 xenv [-L ...] -L pscom-gateway ./prog1 : -p <booster, ...> xenv [-L ...] msa_fix_ld ./prog2
When submitting a job that will run later, you have to specify the number of gateways at submit time: When submitting a job that will run later, you have to specify the number of gateways at submit time:
...@@ -99,20 +124,8 @@ When submitting a job that will run later, you have to specify the number of gat ...@@ -99,20 +124,8 @@ When submitting a job that will run later, you have to specify the number of gat
#SBATCH packjob #SBATCH packjob
#SBATCH -p <booster, ...> #SBATCH -p <booster, ...>
srun xenv -L pscom-gateway msa_fix_ld ./prog1 : xenv -L pscom-gateway msa_fix_ld ./prog2 srun xenv [-L ...] -L pscom-gateway ./prog1 : xenv [-L ...] msa_fix_ld ./prog2
Current MPI Workaround (April 2019)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
For the time being, prefixing binaries via ``msa_fix_ld`` is necessary. This is due to the fact that the installed libmpi version does not support the psgwd. We hope this will go away soon.
``msa_fix_ld`` is modifying the environment, so it might influence the modules you load.
.. code-block:: none
#!/bin/bash
export PSP_PSM=1
export LD_LIBRARY_PATH="/usr/local/jsc/msa_parastation_mpi/lib:/usr/local/jsc/msa_parastation_mpi/lib/mpi-hpl-gcc/:${LD_LIBRARY_PATH}"
$*
PSGWD PSGWD
~~~~~ ~~~~~
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment