MLAir issueshttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues2020-01-30T11:30:39+01:00https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/35fasten histogram calculation in plot_conditional_quantiles2020-01-30T11:30:39+01:00Ghost Userfasten histogram calculation in plot_conditional_quantilesThe local function `segment_data()` from `plot_conditional_quantiles()` is quite slow. The step to transform from xarray to pandas is fast but the bin calculation `.apply(pd.cut, bins=bins, labels=bins[1:]).T.values` then is very slow. t...The local function `segment_data()` from `plot_conditional_quantiles()` is quite slow. The step to transform from xarray to pandas is fast but the bin calculation `.apply(pd.cut, bins=bins, labels=bins[1:]).T.values` then is very slow. think about a better design. After more then 2 hours search, I couldn't find any build-in function in xarray. all the group functions work on the coordinate level and cannot be applied to the data itself.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/75implement tests for ols model2020-03-10T10:25:44+01:00Ghost Userimplement tests for ols modelCurrently, there are no tests for the ordinary least squared linear model. Implement it.Currently, there are no tests for the ordinary least squared linear model. Implement it.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/83parallel and scalable bootstrap prediction2022-06-07T12:21:08+02:00Ghost Userparallel and scalable bootstrap predictionCurrently bootstrapping take some time due to its sequential implementation. Recent developments (see #164) show a huge gain in terms of computation time when using parallel computing. Therefore introduce parallel bootstrap calculations ...Currently bootstrapping take some time due to its sequential implementation. Recent developments (see #164) show a huge gain in terms of computation time when using parallel computing. Therefore introduce parallel bootstrap calculations in the same manner like applied in the preprocessing.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/84parameter collections2020-03-20T14:27:09+01:00Ghost Userparameter collectionsimplement different combinations of experiment_setup input parameters that are frequently used. Like default_plot_settings or so.implement different combinations of experiment_setup input parameters that are frequently used. Like default_plot_settings or so.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/89Update tests for check_valid_stations2020-09-21T16:31:25+02:00Ghost UserUpdate tests for check_valid_stationsOne test is missing to achieve 100% CoverageOne test is missing to achieve 100% Coveragehttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/92log print statements from keras2022-06-07T12:21:37+02:00Ghost Userlog print statements from kerascatch all print statements from keras (e.g. during training) and log them. Check to prevent duplications in logging.catch all print statements from keras (e.g. during training) and log them. Check to prevent duplications in logging.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/96pickle vs. hickle2020-04-02T09:38:33+02:00Ghost Userpickle vs. hickletry to replace all pickle commands by hickle commands. Try, if this works for customised objects and if it is faster. related to #97try to replace all pickle commands by hickle commands. Try, if this works for customised objects and if it is faster. related to #97https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/97tf records2020-06-25T09:25:45+02:00Ghost Usertf recordsThink about storing the training data as tf records. Will it fasten the training stage? related to #96 Think about storing the training data as tf records. Will it fasten the training stage? related to #96 https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/98record data prep steps2020-04-01T09:26:03+02:00Ghost Userrecord data prep stepscreate a method, that tracks all settings, that are used for data. something like
1) download o3 in range year start - end from join
1) tranform
1) interpolate with **settings**
1) create hist, label, ..
1) do upsampling
1) remove nans
...create a method, that tracks all settings, that are used for data. something like
1) download o3 in range year start - end from join
1) tranform
1) interpolate with **settings**
1) create hist, label, ..
1) do upsampling
1) remove nans
If one wants to know about data, just read this file. This feature could somehow implemented as graph?https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/101Customise order of tails2020-04-03T10:35:23+02:00Ghost UserCustomise order of tailsRight now the main tail has to be the last tail listed in loss etc.
Instead, we should use a flag `tail_format='main_last'`, like keras is using `data_format` for channels first/ last for conv layers.
The default in mlt should be 'mai...Right now the main tail has to be the last tail listed in loss etc.
Instead, we should use a flag `tail_format='main_last'`, like keras is using `data_format` for channels first/ last for conv layers.
The default in mlt should be 'main_last'https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/102save progress of experiment steps2022-08-31T10:39:20+02:00Ghost Usersave progress of experiment stepsThis applies only to `RunEnvironment` and its inheritances.
* save datastore as pickle (on `__del__`): naming like `checkpoint_<class_name>.pkl`
* query if class is already executed (on `__init__`)
* skip, if already executed (directly ...This applies only to `RunEnvironment` and its inheritances.
* save datastore as pickle (on `__del__`): naming like `checkpoint_<class_name>.pkl`
* query if class is already executed (on `__init__`)
* skip, if already executed (directly go from `__init__` to `__del__`)
* force button for rerun (independently if checkpoint is available or not)
*Background*: This issue is required, if mlt is running on HPC systems and different partitions. E.g. experiment setup and preprocessing shall run on CPU-nodes (and on login-nodes because of the required internet connection), but the training step should be performed on the GPU partition. The post-processing (not evaluated yet, if GPU is required for bootstrap prediction and if it is actually faster) can be performed afterwards on CPU again.
* [ ] implement checkpoint saving on local disk
* [ ] implement loading of checkpoints
* [ ] implement skipping of execution if checkpoint was loaded
* [ ] implement `RunEnvironment` behaviour (when not called as inheritance): add force button, clean-up button
* [ ] check speed of postprocessing depending on partition (not really related, but interesting for the final setup: if postprocessing is run on CPU or GPU)HPChttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/104Run single experiment steps on different partitions2020-04-03T10:46:41+02:00Ghost UserRun single experiment steps on different partitionsDepends on #102, #105
* [ ] wait for #102
* [ ] wait for #105
* [ ] create more customised shell script to execute different steps on different partitions (not parallel!): pre on cpu, train on gpu, ...Depends on #102, #105
* [ ] wait for #102
* [ ] wait for #105
* [ ] create more customised shell script to execute different steps on different partitions (not parallel!): pre on cpu, train on gpu, ...HPChttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/105Force custom program termination2020-04-03T10:47:36+02:00Ghost UserForce custom program terminationOn run call, it is possible to specify, which steps should be executed. Because every steps depends on its previous steps, it shouldn't be possible to call training only. But on the other side, it should be possible to stop after preproc...On run call, it is possible to specify, which steps should be executed. Because every steps depends on its previous steps, it shouldn't be possible to call training only. But on the other side, it should be possible to stop after preprocessing or any other step. This is required to run different parts of an experiment on different partitions (see #104). If a termination step is given, only all precursory steps and itself are executed. Progress is saved locally (anyway because of #102).HPChttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/117set real path2020-04-30T17:36:49+02:00Ghost Userset real pathWhen using ln -s some links might be not found. Therefore, integrate `os.path.realpath()`
See
https://unix.stackexchange.com/questions/196656/when-i-cd-through-a-symlink-why-does-pwd-show-the-symlink-instead-of-the-real-p/196753When using ln -s some links might be not found. Therefore, integrate `os.path.realpath()`
See
https://unix.stackexchange.com/questions/196656/when-i-cd-through-a-symlink-why-does-pwd-show-the-symlink-instead-of-the-real-p/196753https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/120test_build_model not generic enough?2020-09-21T16:31:32+02:00Ghost Usertest_build_model not generic enough?
`test_buld_model()` uses the imported model MyModel. Shouldn't it be better to use a dummy test model for the setup and instead create custom tests for all classes in model_class.py?
`test_buld_model()` uses the imported model MyModel. Shouldn't it be better to use a dummy test model for the setup and instead create custom tests for all classes in model_class.py?https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/128Padding2D: allow to add other user defined paddings2020-08-24T13:16:44+02:00Ghost UserPadding2D: allow to add other user defined paddingsWe should add a method which allows adding new padding types to `allowed_paddings`.
Maybe something like
```python
def add_custom_padding(names, padding_layer):
self.allowed_paddings.update(**dict.fromkeys(names, padding_layer))
```We should add a method which allows adding new padding types to `allowed_paddings`.
Maybe something like
```python
def add_custom_padding(names, padding_layer):
self.allowed_paddings.update(**dict.fromkeys(names, padding_layer))
```https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/132Tracking of units2020-12-10T16:39:50+01:00Ghost UserTracking of unitsWe should track the units to automatically create correct labels on plotsWe should track the units to automatically create correct labels on plotshttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/148Implement ROC Curves2020-07-15T11:45:32+02:00Ghost UserImplement ROC CurvesIt might be of interest to generate ROC curves for threshold exceedance predictions.
See Wilks (2006, Ch 7.4.6) for detailed info.It might be of interest to generate ROC curves for threshold exceedance predictions.
See Wilks (2006, Ch 7.4.6) for detailed info.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/150seed for tf2020-07-21T10:13:46+02:00Ghost Userseed for tfis it possible to set the seedis it possible to set the seedhttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/155BUG: empty monthly plot using hourly data2020-12-10T16:40:51+01:00Ghost UserBUG: empty monthly plot using hourly dataThe monthly summary plot seems to be empty when using data with hourly temporal resolution. Investigate origin of this behaviour.The monthly summary plot seems to be empty when using data with hourly temporal resolution. Investigate origin of this behaviour.Hourly data resolution