... | ... | @@ -6,36 +6,38 @@ |
|
|
|
|
|
Fahad Khalid (@khalid1): Created the [Getting started with ML/DL on Supercomputers](https://gitlab.version.fz-juelich.de/hpc4ns/dl_on_supercomputers#getting-started-with-deep-learning-on-supercomputers) tutorial, which has been tested on JUWELS, JURECA, and JURON.
|
|
|
|
|
|
A workshop ["Intro to Scalable Deep Learning"](https://gitlab.version.fz-juelich.de/MLDL_FZJ/juhaicu/jsc_public/sharedspace/teaching/intro_scalable_dl_2021/course-material) created by Mehdi Cherti, Jan Ebert, Alex Strube, Roshni Kamath, Stefan Kesselheim and Jenia Jitsev features a number of lectures and tutorials on distributed deep learning, including Horovod usage, on supercomputers.
|
|
|
|
|
|
# Workflows
|
|
|
|
|
|
## Jupyter Notebooks for HPC
|
|
|
## Jupyter Notebooks for HPC
|
|
|
|
|
|
tbd: Jens Henrik Goebbert @goebbert1
|
|
|
|
|
|
You can register with either a JSC user account: https://judoor.fz-juelich.de/register , or with any University/Research center account: https://hdf-cloud.fz-juelich.de/
|
|
|
|
|
|
|
|
|
To access the JSC JupyterLab, the link is: http://jupyter-jsc.fz-juelich.de
|
|
|
To access the JSC JupyterLab, the link is: http://jupyter-jsc.fz-juelich.de
|
|
|
|
|
|
|
|
|
It is important to know what and where you want to run. For ML and DL workloads, it is advisable to use the GPU nodes of the supercomputers. When choosing the parameters for a run, it is important to use the "gpus" or "develgpus" partitions. If you are not using parallel environments (such as Horovod), choose only one node of the Develgpus partition: You will get an allocation faster, and other people can also work on it.
|
|
|
|
|
|
|
|
|
Always choose the maximum number of GPUs allowed.
|
|
|
Always choose the maximum number of GPUs allowed.
|
|
|
|
|
|
|
|
|
Given that this is a shared environment, it might take a while until you get a reservation. Therefore, there is a new option to receive an email when your reservation is granted.
|
|
|
Given that this is a shared environment, it might take a while until you get a reservation. Therefore, there is a new option to receive an email when your reservation is granted.
|
|
|
|
|
|
|
|
|
## Neuro4HPC
|
|
|
## Neuro4HPC
|
|
|
|
|
|
[Deep Learning in neuroscience using HPC systems (work in progress)](Neuro4HPC)
|
|
|
|
|
|
## Setup of common tools for HPC
|
|
|
## Setup of common tools for HPC
|
|
|
|
|
|
### TensorFlow and Horovod for ImageNet
|
|
|
### TensorFlow and Horovod for ImageNet
|
|
|
|
|
|
tbd: Jenia Jitsev @jitsev1
|
|
|
tbd: Jenia Jitsev @jitsev1
|
|
|
|
|
|
### JURECA and JUWELS
|
|
|
|
... | ... | @@ -62,8 +64,8 @@ Fahad Khalid (@khalid1): The following Deep Learning related modules are availab |
|
|
```
|
|
|
|
|
|
All thanks to Andreas Herten (@herten1) for installing these modules and the many dependencies.
|
|
|
|
|
|
### Pytorch & HEAT
|
|
|
|
|
|
### Pytorch & HEAT
|
|
|
|
|
|
tbd: Bjoern Hagemeier @hagemeier2
|
|
|
|
... | ... | @@ -72,4 +74,4 @@ tbd: Bjoern Hagemeier @hagemeier2 |
|
|
tbd: @all
|
|
|
|
|
|
---
|
|
|
[[Home](Home)] |
|
|
\ No newline at end of file |
|
|
[[Home](Home)] |