MLAir issueshttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues2020-09-21T16:31:25+02:00https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/89Update tests for check_valid_stations2020-09-21T16:31:25+02:00Ghost UserUpdate tests for check_valid_stationsOne test is missing to achieve 100% CoverageOne test is missing to achieve 100% Coveragehttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/433Contingency Analysis2023-06-27T14:53:57+02:00Ghost UserContingency AnalysisImplement some kind of contingency analysis:
* [ ] add hit rate and the other metrics
* [ ] for dma8eu ozone: calculate metrics for legal limits
* [ ] calculate percent hits of the top 2,5,10%: Does a model/competitor exceed its own X% ...Implement some kind of contingency analysis:
* [ ] add hit rate and the other metrics
* [ ] for dma8eu ozone: calculate metrics for legal limits
* [ ] calculate percent hits of the top 2,5,10%: Does a model/competitor exceed its own X% threshold if this is the case for the observation?https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/429use model display name for latex reports2023-06-19T12:28:14+02:00Ghost Useruse model display name for latex reportsuse model display name for latex reports instead of using model indicator (nn)use model display name for latex reports instead of using model indicator (nn)https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/446improve prediction speed2023-06-19T12:27:59+02:00Ghost Userimprove prediction speed
Improve prediction speed of DL model by indicating batch size in `.predict` call.
https://www.tensorflow.org/api_docs/python/tf/keras/Model#predict
```python
def make_prediction(self, subset):
"""
Create predictio...
Improve prediction speed of DL model by indicating batch size in `.predict` call.
https://www.tensorflow.org/api_docs/python/tf/keras/Model#predict
```python
def make_prediction(self, subset):
"""
Create predictions for NN, OLS, and persistence and add true observation as reference.
Predictions are filled in an array with full index range. Therefore, predictions can have missing values. All
predictions for a single station are stored locally under `<forecast/forecast_norm>_<station>_test.nc` and can
be found inside `forecast_path`.
"""
subset_type = subset.name
logging.info(f"start make_prediction for {subset_type}")
time_dimension = self.data_store.get("time_dim")
window_dim = self.data_store.get("window_dim")
for i, data in enumerate(subset):
input_data = data.get_X()
target_data = data.get_Y(as_numpy=False)
observation_data = data.get_observation()
# get scaling parameters
transformation_func = data.apply_transformation
nn_output = self.model.predict(input_data)
```https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/97tf records2020-06-25T09:25:45+02:00Ghost Usertf recordsThink about storing the training data as tf records. Will it fasten the training stage? related to #96 Think about storing the training data as tf records. Will it fasten the training stage? related to #96 https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/327Histograms in not normalized space2022-08-31T10:42:34+02:00Ghost UserHistograms in not normalized space# Issue
For some applications, it would be great to have a look on the true data distribution and not only on the transformed data.
# ToDos
* [ ] add plot that creates and draws histograms on raw data# Issue
For some applications, it would be great to have a look on the true data distribution and not only on the transformed data.
# ToDos
* [ ] add plot that creates and draws histograms on raw datahttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/83parallel and scalable bootstrap prediction2022-06-07T12:21:08+02:00Ghost Userparallel and scalable bootstrap predictionCurrently bootstrapping take some time due to its sequential implementation. Recent developments (see #164) show a huge gain in terms of computation time when using parallel computing. Therefore introduce parallel bootstrap calculations ...Currently bootstrapping take some time due to its sequential implementation. Recent developments (see #164) show a huge gain in terms of computation time when using parallel computing. Therefore introduce parallel bootstrap calculations in the same manner like applied in the preprocessing.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/102save progress of experiment steps2022-08-31T10:39:20+02:00Ghost Usersave progress of experiment stepsThis applies only to `RunEnvironment` and its inheritances.
* save datastore as pickle (on `__del__`): naming like `checkpoint_<class_name>.pkl`
* query if class is already executed (on `__init__`)
* skip, if already executed (directly ...This applies only to `RunEnvironment` and its inheritances.
* save datastore as pickle (on `__del__`): naming like `checkpoint_<class_name>.pkl`
* query if class is already executed (on `__init__`)
* skip, if already executed (directly go from `__init__` to `__del__`)
* force button for rerun (independently if checkpoint is available or not)
*Background*: This issue is required, if mlt is running on HPC systems and different partitions. E.g. experiment setup and preprocessing shall run on CPU-nodes (and on login-nodes because of the required internet connection), but the training step should be performed on the GPU partition. The post-processing (not evaluated yet, if GPU is required for bootstrap prediction and if it is actually faster) can be performed afterwards on CPU again.
* [ ] implement checkpoint saving on local disk
* [ ] implement loading of checkpoints
* [ ] implement skipping of execution if checkpoint was loaded
* [ ] implement `RunEnvironment` behaviour (when not called as inheritance): add force button, clean-up button
* [ ] check speed of postprocessing depending on partition (not really related, but interesting for the final setup: if postprocessing is run on CPU or GPU)HPChttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/230Boost decoupling of MLAir and data handlers2022-08-31T10:39:18+02:00Ghost UserBoost decoupling of MLAir and data handlersSome parameters that are related to our data handlers could be removed from MLAir's workflow and run_module to serve as a more general workflow. This refers for example to the parameters `window_history_length` and `window_lead_time` whi...Some parameters that are related to our data handlers could be removed from MLAir's workflow and run_module to serve as a more general workflow. This refers for example to the parameters `window_history_length` and `window_lead_time` which may have no influence for custom data handlers. These parameters are only used to build the data handler, but this could be also done by default values inside the data handler. If these parameters should be adjusted, it would be still possible to add the key value pair in the workflow's init call. All kwargs will be also stored in the data store and are therefore available for the data handler.
Create a list of parameters, that are **candidates for removal**:
* [ ] `window_history_length`
* [ ] `window_lead_time`
* [ ] `interpolation_method`
* [ ] `interpolation_limit`
* [ ] `data_origin`
* [ ] `variables`
* [ ] `statistics_per_var`
* [ ] `extreme_values`
* [ ] `extremes_on_right_tail_only`
* [ ] `neighbors`
* [ ] `overwrite_local_data` (<- discussion on this parameter)
* [ ] `sampling` (<- discussion on this parameter, used in postprocessing - for what?)
* [ ] `store_data_locally`
* [ ] `store_processed_data` (<- discussion on this parameter, actually I forgot the meaning of this)
* [x] `target_dim` (<- discussion on this parameter, this could be required for all postprocessing routines, but is currently not used in fact! - is there still too much hardcoded? - solved in #272)
* [ ] `target_var` (<- discussion on this parameter)
It is not easy to remove parameters, that change for subsets like `start` and `min_length`. **How to deal with this?**
Next indicate which parameters are going to be removed!
**Further tasks**
* [x] decouple make prediction in postprocessing (solved in #272)https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/166error on failure2022-08-31T10:41:49+02:00Ghost Usererror on failureIf MLAir raises an error, it currently cannot find the logging file and raises another error. correct this!If MLAir raises an error, it currently cannot find the logging file and raises another error. correct this!https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/352include clim forecasts into error estimate and skill score2022-08-31T10:59:33+02:00Ghost Userinclude clim forecasts into error estimate and skill score# include clim forecasts
do not only investigate the climatological skill scores, but also add these forecasts into all other comparisons (like the persistence)
## tasks
* [ ] think about which CASEs to use first (or just make all 4 c...# include clim forecasts
do not only investigate the climatological skill scores, but also add these forecasts into all other comparisons (like the persistence)
## tasks
* [ ] think about which CASEs to use first (or just make all 4 cases possible?)
* [ ] calculate clim forecasts
* [ ] add as competitor
* [ ] be able to include a clim forecasts via competitor call (e.g. as "CASE1")
* [ ] check if clim forecast is included into all relevant metrics/tables/plotshttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/163(Too) big data2022-08-31T10:39:40+02:00Ghost User(Too) big dataData stored locally can grow up to 1.5 GB per file (station).
* [ ] check if storing of total data is required or not (it could be, that this data are not used - only the subsets.)
* [ ] Think about another storing strategy to replace `...Data stored locally can grow up to 1.5 GB per file (station).
* [ ] check if storing of total data is required or not (it could be, that this data are not used - only the subsets.)
* [ ] Think about another storing strategy to replace `.pickle` e.g. by xarrays dataset storagehttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/35fasten histogram calculation in plot_conditional_quantiles2020-01-30T11:30:39+01:00Ghost Userfasten histogram calculation in plot_conditional_quantilesThe local function `segment_data()` from `plot_conditional_quantiles()` is quite slow. The step to transform from xarray to pandas is fast but the bin calculation `.apply(pd.cut, bins=bins, labels=bins[1:]).T.values` then is very slow. t...The local function `segment_data()` from `plot_conditional_quantiles()` is quite slow. The step to transform from xarray to pandas is fast but the bin calculation `.apply(pd.cut, bins=bins, labels=bins[1:]).T.values` then is very slow. think about a better design. After more then 2 hours search, I couldn't find any build-in function in xarray. all the group functions work on the coordinate level and cannot be applied to the data itself.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/392REFAC: Rename lr to learning rate2022-08-31T10:38:04+02:00Ghost UserREFAC: Rename lr to learning rateRename `lr` with `learning_rate` as this is deprecated.
```
/usr/lib64/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:375: UserWarning: The `lr` argument is deprecated, use `learning_rate` instead.
"Th...Rename `lr` with `learning_rate` as this is deprecated.
```
/usr/lib64/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:375: UserWarning: The `lr` argument is deprecated, use `learning_rate` instead.
"The `lr` argument is deprecated, use `learning_rate` instead.")
```https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/92log print statements from keras2022-06-07T12:21:37+02:00Ghost Userlog print statements from kerascatch all print statements from keras (e.g. during training) and log them. Check to prevent duplications in logging.catch all print statements from keras (e.g. during training) and log them. Check to prevent duplications in logging.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/386ClimateFIR customize agg for apriori2023-06-19T12:28:40+02:00Ghost UserClimateFIR customize agg for apriori# Customize Apriori
Current status: When applying ClimateFIR, apriori is calculated always on means (e.g. monthly mean).
IDEA: As especially for chemicals the span of values might spread a lot, one could consider to use the more robus...# Customize Apriori
Current status: When applying ClimateFIR, apriori is calculated always on means (e.g. monthly mean).
IDEA: As especially for chemicals the span of values might spread a lot, one could consider to use the more robust median to replace the average. As can be seen in the example, the estimate is not that good.
![example_mean_median](/uploads/ac85c390eb1bee2de5e808be5cd95759/example_mean_median.png)
In the next example, one can see that in some cases the median is much lower than the mean and very close to the lower percentile. This indicates, that some "bigger" values increase the mean while in most cases concentration is low.
JANUARY
![example_percentile](/uploads/087a3ce129ba323863133fb63cf5d0ee/example_percentile.png)
In summer, there is less this issue:
AUGUST
![example_percentile_aug](/uploads/88b02cd4c5eb6f68a8890e1e8eef49cc/example_percentile_aug.png)
## TODOs
* [ ] be able to use given statistic (from `mean`, `median`, maybe think about quantiles)https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/265REFAC: remove defaults from default data handler2022-06-07T12:18:56+02:00Ghost UserREFAC: remove defaults from default data handlerCurrently the `DataHandlerSingleStation` sets a lot of defaults. This leads to the problem, that some settings are set by the data handler and not in an experiment setup (e.g. station_type is not given in the experiment setup, but is set...Currently the `DataHandlerSingleStation` sets a lot of defaults. This leads to the problem, that some settings are set by the data handler and not in an experiment setup (e.g. station_type is not given in the experiment setup, but is set by the data handler). It should be more clear, what defaults are used. Therefore, I suggest to remove some defaults and use Nones instead.
* [ ] investigate which parameters shouldn't be filled with a default value by the data handler
* [ ] create a list with these parameters
* [ ] apply refactoringhttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/75implement tests for ols model2020-03-10T10:25:44+01:00Ghost Userimplement tests for ols modelCurrently, there are no tests for the ordinary least squared linear model. Implement it.Currently, there are no tests for the ordinary least squared linear model. Implement it.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/84parameter collections2020-03-20T14:27:09+01:00Ghost Userparameter collectionsimplement different combinations of experiment_setup input parameters that are frequently used. Like default_plot_settings or so.implement different combinations of experiment_setup input parameters that are frequently used. Like default_plot_settings or so.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/96pickle vs. hickle2020-04-02T09:38:33+02:00Ghost Userpickle vs. hickletry to replace all pickle commands by hickle commands. Try, if this works for customised objects and if it is faster. related to #97try to replace all pickle commands by hickle commands. Try, if this works for customised objects and if it is faster. related to #97https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/98record data prep steps2020-04-01T09:26:03+02:00Ghost Userrecord data prep stepscreate a method, that tracks all settings, that are used for data. something like
1) download o3 in range year start - end from join
1) tranform
1) interpolate with **settings**
1) create hist, label, ..
1) do upsampling
1) remove nans
...create a method, that tracks all settings, that are used for data. something like
1) download o3 in range year start - end from join
1) tranform
1) interpolate with **settings**
1) create hist, label, ..
1) do upsampling
1) remove nans
If one wants to know about data, just read this file. This feature could somehow implemented as graph?https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/104Run single experiment steps on different partitions2020-04-03T10:46:41+02:00Ghost UserRun single experiment steps on different partitionsDepends on #102, #105
* [ ] wait for #102
* [ ] wait for #105
* [ ] create more customised shell script to execute different steps on different partitions (not parallel!): pre on cpu, train on gpu, ...Depends on #102, #105
* [ ] wait for #102
* [ ] wait for #105
* [ ] create more customised shell script to execute different steps on different partitions (not parallel!): pre on cpu, train on gpu, ...HPChttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/105Force custom program termination2020-04-03T10:47:36+02:00Ghost UserForce custom program terminationOn run call, it is possible to specify, which steps should be executed. Because every steps depends on its previous steps, it shouldn't be possible to call training only. But on the other side, it should be possible to stop after preproc...On run call, it is possible to specify, which steps should be executed. Because every steps depends on its previous steps, it shouldn't be possible to call training only. But on the other side, it should be possible to stop after preprocessing or any other step. This is required to run different parts of an experiment on different partitions (see #104). If a termination step is given, only all precursory steps and itself are executed. Progress is saved locally (anyway because of #102).HPChttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/117set real path2020-04-30T17:36:49+02:00Ghost Userset real pathWhen using ln -s some links might be not found. Therefore, integrate `os.path.realpath()`
See
https://unix.stackexchange.com/questions/196656/when-i-cd-through-a-symlink-why-does-pwd-show-the-symlink-instead-of-the-real-p/196753When using ln -s some links might be not found. Therefore, integrate `os.path.realpath()`
See
https://unix.stackexchange.com/questions/196656/when-i-cd-through-a-symlink-why-does-pwd-show-the-symlink-instead-of-the-real-p/196753https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/266Restructure IntelliO3 refrerence download2021-02-04T16:45:13+01:00Ghost UserRestructure IntelliO3 refrerence downloadIn \#131 we made IntelliO3-ts v1.0 available as reference forecast (see `mlair/reference_data_handler/`). Currently, we use `os.system(tar ...)` to extract forecasts from the tar.gz file published on https://b2share.eudat.eu/records/5042...In \#131 we made IntelliO3-ts v1.0 available as reference forecast (see `mlair/reference_data_handler/`). Currently, we use `os.system(tar ...)` to extract forecasts from the tar.gz file published on https://b2share.eudat.eu/records/5042cda41a4c49769cc4010d231ecdec
We should replace the `os.` commands with the tarfile package.Competitorshttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/132Tracking of units2020-12-10T16:39:50+01:00Ghost UserTracking of unitsWe should track the units to automatically create correct labels on plotsWe should track the units to automatically create correct labels on plotshttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/128Padding2D: allow to add other user defined paddings2020-08-24T13:16:44+02:00Ghost UserPadding2D: allow to add other user defined paddingsWe should add a method which allows adding new padding types to `allowed_paddings`.
Maybe something like
```python
def add_custom_padding(names, padding_layer):
self.allowed_paddings.update(**dict.fromkeys(names, padding_layer))
```We should add a method which allows adding new padding types to `allowed_paddings`.
Maybe something like
```python
def add_custom_padding(names, padding_layer):
self.allowed_paddings.update(**dict.fromkeys(names, padding_layer))
```https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/148Implement ROC Curves2020-07-15T11:45:32+02:00Ghost UserImplement ROC CurvesIt might be of interest to generate ROC curves for threshold exceedance predictions.
See Wilks (2006, Ch 7.4.6) for detailed info.It might be of interest to generate ROC curves for threshold exceedance predictions.
See Wilks (2006, Ch 7.4.6) for detailed info.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/150seed for tf2020-07-21T10:13:46+02:00Ghost Userseed for tfis it possible to set the seedis it possible to set the seedhttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/158Climatological Skill Scores (hourly resolution)2020-12-10T16:37:35+01:00Ghost UserClimatological Skill Scores (hourly resolution)Is the concept of climatological skill scores applicable for hourly resoluted data? Is it required to subdivide the mean state by daytime in addition to the monthly separation?Is the concept of climatological skill scores applicable for hourly resoluted data? Is it required to subdivide the mean state by daytime in addition to the monthly separation?Hourly data resolutionhttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/155BUG: empty monthly plot using hourly data2020-12-10T16:40:51+01:00Ghost UserBUG: empty monthly plot using hourly dataThe monthly summary plot seems to be empty when using data with hourly temporal resolution. Investigate origin of this behaviour.The monthly summary plot seems to be empty when using data with hourly temporal resolution. Investigate origin of this behaviour.Hourly data resolutionhttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/156Persistence forecast (hourly resolution)2020-12-10T16:40:31+01:00Ghost UserPersistence forecast (hourly resolution)Create a new heuristic for a persistence forecast when investigating hourly resoluted data.Create a new heuristic for a persistence forecast when investigating hourly resoluted data.Hourly data resolutionhttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/161Custom data split ratio2020-09-21T13:35:39+02:00Ghost UserCustom data split ratiocurrently the data split ratio between training and validation data is hardcoded to 80-20. Add the possibility to change this.currently the data split ratio between training and validation data is hardcoded to 80-20. Add the possibility to change this.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/167TF light2020-09-16T12:14:22+02:00Ghost UserTF lightIt might be usefull to convert the network to a "production" state.
See https://www.tensorflow.org/lite/convertIt might be usefull to convert the network to a "production" state.
See https://www.tensorflow.org/lite/converthttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/179REFAC: unify default experiment name2020-12-10T17:16:16+01:00Ghost UserREFAC: unify default experiment nameCurrently, there are two ways to name an experiment if not provided.
* If using the default workflow, the name will be set to `testrun`.
* Using the setup run module as standalone or in a custom workflow, the name will become `TestExper...Currently, there are two ways to name an experiment if not provided.
* If using the default workflow, the name will be set to `testrun`.
* Using the setup run module as standalone or in a custom workflow, the name will become `TestExperiment`
Choose one of the names above and apply for both cases or think about a new default experiment name.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/182REFAC: batch size minimum2020-09-29T16:25:09+02:00Ghost UserREFAC: batch size minimumThe keras iterator create a distinct file for each batch. If batch size becomes very small, this will result in the creation of many files containing only a very smal amount of data (or only a single data point if batch size is 1). There...The keras iterator create a distinct file for each batch. If batch size becomes very small, this will result in the creation of many files containing only a very smal amount of data (or only a single data point if batch size is 1). Therefore create a threshold to combine batches in a file if the batch size is below.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/186log scale for feature importance2020-10-02T08:52:44+02:00Ghost Userlog scale for feature importanceIf a network highly depends on a single variable, the influence of the remaining variables is hard to determine graphically because of the linear scale.
Set log scale for axis: https://seaborn.pydata.org/examples/horizontal_boxplot.html...If a network highly depends on a single variable, the influence of the remaining variables is hard to determine graphically because of the linear scale.
Set log scale for axis: https://seaborn.pydata.org/examples/horizontal_boxplot.html
Use symmetric log scale because skill scores can be both positive and negative: https://matplotlib.org/3.1.1/api/scale_api.html#matplotlib.scale.SymmetricalLogScalehttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/181Model visualisation2020-09-29T16:08:54+02:00Ghost UserModel visualisationWe might integrate better visualisations of the model.
Here is a repo wich summarises different tools (we have to evaluate which ones can read Kras/TF model structures).
https://github.com/ashishpatel26/Tools-to-Design-or-Visualize-Arc...We might integrate better visualisations of the model.
Here is a repo wich summarises different tools (we have to evaluate which ones can read Kras/TF model structures).
https://github.com/ashishpatel26/Tools-to-Design-or-Visualize-Architecture-of-Neural-Networkhttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/178REFAC: experiment date vs name2021-07-23T18:18:55+02:00Ghost UserREFAC: experiment date vs nameRefactor the behaviour of experiment date and experiment name. It is not clear, why it is not possible to set the experiment name from outside but only the experiment date.
* [ ] Remove the experiment date parameter!
Furthermore, adju...Refactor the behaviour of experiment date and experiment name. It is not clear, why it is not possible to set the experiment name from outside but only the experiment date.
* [ ] Remove the experiment date parameter!
Furthermore, adjust the naming with the appended sampling rate. Since the sampling rate is allowed to be a tuple, the experiment name can become quite ugly!
* [ ] Either check if sampling rate is a tuple (and use the 2nd entry in this case) or remove the code that adds the sampling to the experiment name.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/217split init and run of all run_modules2021-07-23T18:18:52+02:00Ghost Usersplit init and run of all run_modulessplit init and run of all run_modules. Also add the init and the run call in the workflow run methodsplit init and run of all run_modules. Also add the init and the run call in the workflow run methodhttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/120test_build_model not generic enough?2020-09-21T16:31:32+02:00Ghost Usertest_build_model not generic enough?
`test_buld_model()` uses the imported model MyModel. Shouldn't it be better to use a dummy test model for the setup and instead create custom tests for all classes in model_class.py?
`test_buld_model()` uses the imported model MyModel. Shouldn't it be better to use a dummy test model for the setup and instead create custom tests for all classes in model_class.py?https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/101Customise order of tails2020-04-03T10:35:23+02:00Ghost UserCustomise order of tailsRight now the main tail has to be the last tail listed in loss etc.
Instead, we should use a flag `tail_format='main_last'`, like keras is using `data_format` for channels first/ last for conv layers.
The default in mlt should be 'mai...Right now the main tail has to be the last tail listed in loss etc.
Instead, we should use a flag `tail_format='main_last'`, like keras is using `data_format` for channels first/ last for conv layers.
The default in mlt should be 'main_last'https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/205stationwise batch-normalisation not supported anymore2020-11-04T14:14:05+01:00Ghost Userstationwise batch-normalisation not supported anymoreDue to developments of #202, the stationwise normalisation (each station train data has mean=0, std=1) former called `scope="station"` (in contrast to `scope="data"`) is not supported on the preprocessing side (but still on the postproce...Due to developments of #202, the stationwise normalisation (each station train data has mean=0, std=1) former called `scope="station"` (in contrast to `scope="data"`) is not supported on the preprocessing side (but still on the postprocessing as far as I can say). If this "feature" is of interest again, the code namely preprocessing and data handler parts must be adjusted.
First idea: It could be sufficient to skip the general transformation step. But how to deal with the remaining subsets? Is there an attribute that is stored for each station separately? (Something like an additional subscope in the data store like `general.train.DEBW107`). Maybe this could be much easier, if #204 is solved in that way, that the remaining subsets are some kind of copy. Therefore the parameters wouldn't be reloaded and the already estimated transformation from the train subset can be used.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/208REFAC: chunks replacing loop2020-11-06T09:49:34+01:00Ghost UserREFAC: chunks replacing loopThere was an memory error when calculation the rolling means in the kz filter data handler. The fast fix was to loop over all variables, extract a subset, calculate the rolling mean and finally merge all variables. Try to go back to a da...There was an memory error when calculation the rolling means in the kz filter data handler. The fast fix was to loop over all variables, extract a subset, calculate the rolling mean and finally merge all variables. Try to go back to a dask approach. Maybe it works, if the chunk size is 1 for the variables dimensionTemporal Decomposed Input Datahttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/209create test methods to verify a custom class implementation2020-11-11T14:13:44+01:00Ghost Usercreate test methods to verify a custom class implementationCreate a test suite that can be used for default testing during CI as well as for an user of MLAir.
By using this test suite, a custom class e.g. a new data handler or model module can be tested, if it fulfils all requirements.
Tests t...Create a test suite that can be used for default testing during CI as well as for an user of MLAir.
By using this test suite, a custom class e.g. a new data handler or model module can be tested, if it fulfils all requirements.
Tests to implement:
* [ ] data handler
* [ ] get_X
* [ ] get_Y
* [ ] build
* [ ] requirements
* [ ] model class
* [ ] model
* [ ] loss
* [ ] run module
* [ ] init (maybe split init and _run, and call always init and run of a stage -> refac)
* [ ] workflow
* [ ] can run?https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/214check bootstrap behaviour on separation of scales2023-06-19T12:29:19+02:00Ghost Usercheck bootstrap behaviour on separation of scalescheck bootstrap behaviour on separation of scales.
Does it work properly? The plot looks promising, but is there a valid shuffling for each variable? Is it more important, to shuflle only a single filter dim to estimate the influence of...check bootstrap behaviour on separation of scales.
Does it work properly? The plot looks promising, but is there a valid shuffling for each variable? Is it more important, to shuflle only a single filter dim to estimate the influence of variable and filter? Or only the low pass terms are shuffled (but for all variables at once).Temporal Decomposed Input Datahttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/218hyperparameters for model class2020-11-20T14:00:24+01:00Ghost Userhyperparameters for model classCurrently, all hyperparameters regarding the model class (e.g. initial learning rate) have to be implemented inside the model class. Change this, that a model can request hyperparameters. Try to use a similar scheme like used for the dat...Currently, all hyperparameters regarding the model class (e.g. initial learning rate) have to be implemented inside the model class. Change this, that a model can request hyperparameters. Try to use a similar scheme like used for the data handler (working with requirements method, that is called by MLAir).https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/219bootstrapped skill scores on denormalised data2020-11-24T16:49:49+01:00Ghost Userbootstrapped skill scores on denormalised dataCurrently normalised forecasts are only stored for the purpose to be usable for the bootstrap analysis. But this could be also performed in the original value space if the retransformation is applied to the bootstrapped prediction. Maybe...Currently normalised forecasts are only stored for the purpose to be usable for the bootstrap analysis. But this could be also performed in the original value space if the retransformation is applied to the bootstrapped prediction. Maybe it is worth to save the computation time of normalised forecasts and apply transformation on the bootstrap predictions instead.
Before taking a decision, first check if the norm forecast is only used by the bootstrap methods or somewhere else.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/234New Map Plot using Folium2022-08-10T15:48:52+02:00Ghost UserNew Map Plot using FoliumThere is an interesting python package called [Folium](https://python-visualization.github.io/folium/index.html) that can create different maps using leaflet.
![image](/uploads/e9fb2feeac76374e77d7e652693c5728/image.png)
![image](/uplo...There is an interesting python package called [Folium](https://python-visualization.github.io/folium/index.html) that can create different maps using leaflet.
![image](/uploads/e9fb2feeac76374e77d7e652693c5728/image.png)
![image](/uploads/aa9cf6429c205f202c83b3e53b263e32/image.png)
![image](/uploads/3dd107b6df62f0ea5b88f499289071d5/image.png)https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/246new tests for experiment setup2020-12-17T10:39:29+01:00Ghost Usernew tests for experiment setup*placeholder for now*
implement tests for run module `experiment_setup`*placeholder for now*
implement tests for run module `experiment_setup`Run Module Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/247new tests for model setup2020-12-17T10:39:50+01:00Ghost Usernew tests for model setup*placeholder for now*
implement tests for run module `model_setup`*placeholder for now*
implement tests for run module `model_setup`Run Module Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/248new tests for post processing2020-12-17T10:40:13+01:00Ghost Usernew tests for post processing*placeholder for now*
implement tests for run module `post_processing`*placeholder for now*
implement tests for run module `post_processing`Run Module Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/249new tests for preprocessing2020-12-17T10:40:31+01:00Ghost Usernew tests for preprocessing*placeholder for now*
implement tests for run module `pre_processing`*placeholder for now*
implement tests for run module `pre_processing`Run Module Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/250new tests for run environment2020-12-17T10:40:58+01:00Ghost Usernew tests for run environment*placeholder for now*
implement tests for run module `run_environment`*placeholder for now*
implement tests for run module `run_environment`Run Module Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/251new tests for training2020-12-17T10:41:15+01:00Ghost Usernew tests for training*placeholder for now*
implement tests for run module `training`*placeholder for now*
implement tests for run module `training`Run Module Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/252tests for abstract data handler2020-12-17T10:43:07+01:00Ghost Usertests for abstract data handler*placeholder for now*
implement tests for `data_handler/abstract_data_handler`*placeholder for now*
implement tests for `data_handler/abstract_data_handler`Data Handler Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/253tests for data handler kz filter2020-12-17T10:43:46+01:00Ghost Usertests for data handler kz filter*placeholder for now*
implement tests for `data_handler/data_handler_kz_filter`*placeholder for now*
implement tests for `data_handler/data_handler_kz_filter`Data Handler Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/254tests for data handler mixed sampling2020-12-17T10:44:12+01:00Ghost Usertests for data handler mixed sampling*placeholder for now*
implement tests for `data_handler/data_handler_mixed_sampling`*placeholder for now*
implement tests for `data_handler/data_handler_mixed_sampling`Data Handler Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/255test for data handler single station2020-12-17T10:44:35+01:00Ghost Usertest for data handler single station*placeholder for now*
implement tests for `data_handler/data_handler_single_station`*placeholder for now*
implement tests for `data_handler/data_handler_single_station`Data Handler Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/256tests for default data handler2020-12-17T10:45:04+01:00Ghost Usertests for default data handler*placeholder for now*
implement tests for `data_handler/default_data_handler`*placeholder for now*
implement tests for `data_handler/default_data_handler`Data Handler Testinghttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/267Create WRF-Chem data handler2021-03-15T11:45:19+01:00Ghost UserCreate WRF-Chem data handlerCreate a Default WRF-Chem data handler. The data handler should be able to
- [x] Read WRF-Chem data
- [x] Extract the column (height and time) for a given lat/lon coordinates
- [ ] Creates labels (e.g. conc's on surface cell at time $t_i...Create a Default WRF-Chem data handler. The data handler should be able to
- [x] Read WRF-Chem data
- [x] Extract the column (height and time) for a given lat/lon coordinates
- [ ] Creates labels (e.g. conc's on surface cell at time $t_i$ where i > 0
- [ ] Creates inputs (e.g. vars from column for times $t_{-J}$ to $t_{0}$, where $J$ is the total number of previous time steps.Include WRC-Chem data handler and run pipelinehttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/279TECH: check removed packages2021-02-19T08:06:48+01:00Ghost UserTECH: check removed packagesDuring #262 some python packages have been removed. Check if MLAir still works without them. Otherwise add required packages again.
* [ ] `Cython`
* [ ] `atomicwrites`
* [ ] `cloudpickle`
* [ ] `more-itertools`
* [ ] `pyproj`
* [ ] `wcw...During #262 some python packages have been removed. Check if MLAir still works without them. Otherwise add required packages again.
* [ ] `Cython`
* [ ] `atomicwrites`
* [ ] `cloudpickle`
* [ ] `more-itertools`
* [ ] `pyproj`
* [ ] `wcwidth`
* [ ] `setuptools` (should be installed automatically when creating venv)https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/282TECH: ensure doc copies2021-02-23T11:12:19+01:00Ghost UserTECH: ensure doc copiesIn case the docs step is failing, no docs are published to pages. Therefore try to have some error handling to ensure that all existing docs are still available even after a failed docs creation stage.In case the docs step is failing, no docs are published to pages. Therefore try to have some error handling to ensure that all existing docs are still available even after a failed docs creation stage.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/271add CDC database DataHandler2021-03-08T15:38:05+01:00Ghost Useradd CDC database DataHandlerThe goal is to add a database access script inside the DataHandler structure for the usage in the master thesis project of Falco.The goal is to add a database access script inside the DataHandler structure for the usage in the master thesis project of Falco.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/287WRF-Datahandler should inherit from SingleStationDatahandler2021-03-03T11:40:33+01:00Ghost UserWRF-Datahandler should inherit from SingleStationDatahandler The wrf datahandler should inherit from SingleStationDataHandler to ensure that transform methods etc are available. The wrf datahandler should inherit from SingleStationDataHandler to ensure that transform methods etc are available.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/291add links to old docs2021-07-23T18:19:04+02:00Ghost Useradd links to old docs* [ ] add links to all old versions either in the main page of docs or on the left as sections* [ ] add links to all old versions either in the main page of docs or on the left as sectionshttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/303WRF DH taking major and tow minor sectors as input2021-04-18T11:53:51+02:00Ghost UserWRF DH taking major and tow minor sectors as inputCreate new dh for wrf chem which takes the two "minor" (left and right) sectors from major sector into account.Create new dh for wrf chem which takes the two "minor" (left and right) sectors from major sector into account.Include WRC-Chem data handler and run pipelinehttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/309Class-based Oversampling technique2021-06-23T16:24:31+02:00Ghost UserClass-based Oversampling technique# Target
Implement a class-based Oversampling technique. Classes are defined by fixed ppb intervalls and the Oversampling then (fully) balances the frequency of the classes. The method is added in pre-processing.
# Tasks
* [x] add met...# Target
Implement a class-based Oversampling technique. Classes are defined by fixed ppb intervalls and the Oversampling then (fully) balances the frequency of the classes. The method is added in pre-processing.
# Tasks
* [x] add method `apply_oversampling` in `PreProcessing`
* [x] store results of `apply_oversampling` in data store
* [x] make all hardcoded parameters (e.g. `bins` or `rates_cap`) more flexible
* [x] add parameter to experiment setup (init, run)
* [x] load information from data store within apply_oversampling by using `data_store.get_default(...)`
* [x] defaults could be either set in the experiment setup (by using the defaults file) or just in the get_default call
The following steps are not specified currently: DataHandler should be able to use the oversampling informationhttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/314too fat boxes in bootstrap focus plot2021-07-15T12:13:51+02:00Ghost Usertoo fat boxes in bootstrap focus plot<!-- Use this template for a bug in MLAir. -->
# Bug
## Error description
<!-- Provide a context when the bug / error arises -->
Box plot in the "separate variables" version of the `PlotBootstrapSkillScore` plot is too huge.
## Error...<!-- Use this template for a bug in MLAir. -->
# Bug
## Error description
<!-- Provide a context when the bug / error arises -->
Box plot in the "separate variables" version of the `PlotBootstrapSkillScore` plot is too huge.
## Error message
<!-- Provide the error log if available -->
![skill_score_bootstrap_nn_separated](/uploads/32cf0df87fd9f426db6d44932ae27ab9/skill_score_bootstrap_nn_separated.png)
## First guess on error origin
<!-- Add first ideas where the error could come from -->
## Error origin
<!-- Fill this up when the bug / error origin has been found -->
## Solution
<!-- Short description how to solve the error -->https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/316mean bootstrapping with specific mean2021-07-21T16:26:03+02:00Ghost Usermean bootstrapping with specific mean# motivation
Currently, mean bootstrapping can only use a zero mean (see #300). To be able to use another mean, a structure is required that can deal with different means (e.g. for different variables).
## status
This is just a remind...# motivation
Currently, mean bootstrapping can only use a zero mean (see #300). To be able to use another mean, a structure is required that can deal with different means (e.g. for different variables).
## status
This is just a reminder, what has to be implemented to use a value that deviates from zero.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/318Workaround to get results2021-08-09T11:46:54+02:00Ghost UserWorkaround to get resultsWorkaround for the NoneType error in climatological_skill_scores.
Error was introduced by commit a843aaeb
This workaround rewinds the commit.Workaround for the NoneType error in climatological_skill_scores.
Error was introduced by commit a843aaeb
This workaround rewinds the commit.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/319Apply TOAR Statistics on WRF-data handler2021-09-07T11:25:02+02:00Ghost UserApply TOAR Statistics on WRF-data handlerInclude calculation of [TOAR statistics](https://gitlab.jsc.fz-juelich.de/esde/toar-public/toarstats/) into the WRF data hander class.
- [x] Include time zone information
- [x] Convert to local time zone
- [x] Apply statistics (ensure c...Include calculation of [TOAR statistics](https://gitlab.jsc.fz-juelich.de/esde/toar-public/toarstats/) into the WRF data hander class.
- [x] Include time zone information
- [x] Convert to local time zone
- [x] Apply statistics (ensure correct time zones!)https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/351new plot: ws vs perf2021-12-21T14:52:51+01:00Ghost Usernew plot: ws vs perf# IDEA
If advection is of major importance, there should be relation between wind speed and the occuring error. Try to visualize this relation in a plot.
## implementation
@leufen1 already implemented some snippets. But it is not realy...# IDEA
If advection is of major importance, there should be relation between wind speed and the occuring error. Try to visualize this relation in a plot.
## implementation
@leufen1 already implemented some snippets. But it is not realy stable as it depends strongly on the available input variables as well as their representation and the used data handler. But maybe this work done is a good start point for another try.
## current status
* Just noted down as an idea (with some initial implementation tries)
* Reconsider this plot at another stage
## code snippets
```python
@TimeTrackingWrapper
class PlotWindToErrorRelation(AbstractPlotClass): # pragma: no cover
def __init__(self, data: xr.DataArray, plot_folder: str = ".", model_type_dim: str = "type",
time_dim="datetime", variables_dim="variables", index_dim="index", wind_var="v", model_name: str = "NN", window_dim="window",
ahead_dim="ahead",
model_indicator: str = "nn", obs_indicator: str = "obs", forecast_path=".", forecast_file=r"forecasts_%s_test.nc"):
super().__init__(plot_folder, "wind_to_error_relation")
self._forecast_path = forecast_path
self._forecast_file = forecast_file
self._model_name = model_indicator
self._obs_name = obs_indicator
self._model_type_dim = model_type_dim
self._wind_var = wind_var
self._time_dim = time_dim
self._index_dim = index_dim
self._ahead_dim = ahead_dim
self._var_dim = variables_dim
self._window_dim = window_dim
data_coll = self._prepare_data(data)
self._plot(data_coll)
def _plot(self, data):
for plot_var in data.keys():
raw_data = data[plot_var]
errors = raw_data.sel(variables="error").to_pandas().to_numpy().flatten()
wind = raw_data.sel(variables=plot_var).to_pandas().to_numpy().flatten()
return
def _prepare_data(self, data):
data_coll = {}
for station in data:
logging.debug(f"... preprocess station {station}")
station_forecast = self._load_data(station)
station_error = self._calc_error(station_forecast)
station_wind = self._load_wind(station, station_error)
station_combined = xr.concat([station_error, station_wind], dim=self._var_dim)
for var in station_wind.coords[self._var_dim].values:
if var not in data_coll.keys():
data_coll[var] = []
data_coll[var].append(station_combined.sel({self._var_dim: [var, "error"]}).squeeze(drop=True).dropna(self._time_dim))
for var in data_coll.keys():
n_start = 0
raw_data = data_coll[var]
d_coll = []
for d in raw_data:
n_steps = len(d.coords[self._time_dim])
new_vals = range(n_start, n_start + n_steps)
d.coords[self._time_dim] = new_vals
d_coll.append(d)
n_start = d.coords[self._time_dim].values.max() + 1
d_coll = xr.concat(d_coll, dim=self._time_dim)
data_coll[var] = d_coll
return data_coll
def _load_data(self, station):
file_name = os.path.join(self._forecast_path, self._forecast_file % station)
with xr.open_dataarray(file_name) as d:
return d.sel({self._model_type_dim: [self._model_name, self._obs_name]}).rename({self._index_dim: self._time_dim})
def _harmonize_data(self, wind, error):
intersect = reduce(np.intersect1d, map(lambda x: x.coords[self._time_dim].values, [wind, error]))
return wind.sel({self._time_dim: intersect}), error.sel({self._time_dim: intersect})
def _calc_error(self, data):
error = data.sel({self._model_type_dim: self._model_name}) - data.sel({self._model_type_dim: self._obs_name})
error = error.expand_dims({self._var_dim: ["error"]})
return error
def _load_wind(self, data, error_data):
d = data.get_X(as_numpy=False)
d0 = d[0]
wind_vars = list(set(self._wind_var).intersection(d0.coords[self._var_dim].values))
d0wind = d0.sel({self._var_dim: wind_vars, self._window_dim: 0}, drop=True)
if len({"u", "v"}.intersection(wind_vars)) == 2:
wind_abs = np.sqrt(d0wind.sel({self._var_dim: "u"})**2 + d0wind.sel({self._var_dim: "v"})**2)
wind_abs = wind_abs.expand_dims({self._var_dim: ["ws"]})
d0wind = xr.concat([d0wind, wind_abs], dim=self._var_dim)
expand_dim_vals = error_data.coords[self._ahead_dim]
d0wind = d0wind.expand_dims({self._ahead_dim: expand_dim_vals})
return d0wind
```https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/354Allow fixed variable list for feature importance boots2022-01-19T12:35:41+01:00Ghost UserAllow fixed variable list for feature importance bootsAllow fixed variable list for feature importance bootsAllow fixed variable list for feature importance bootshttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/363calculate toar metrics on hourly forecasts2022-02-22T14:17:51+01:00Ghost Usercalculate toar metrics on hourly forecasts# TOAR metrics on hourly data
* [ ] forecast e.g. 4x24h by model
* [ ] be able to apply toar metrics on this data to return 1d resoluted data
* [ ] use this data as forecast to be used as comparison with other models# TOAR metrics on hourly data
* [ ] forecast e.g. 4x24h by model
* [ ] be able to apply toar metrics on this data to return 1d resoluted data
* [ ] use this data as forecast to be used as comparison with other modelshttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/377display shuffle count in separate box2022-07-19T12:16:51+02:00Ghost Userdisplay shuffle count in separate boxHighlight the shuffle count of bootstraps in feature importance analysis in separate box as done for the uncertainty estimate. Currently, this information is added in bracket in the plot title.Highlight the shuffle count of bootstraps in feature importance analysis in separate box as done for the uncertainty estimate. Currently, this information is added in bracket in the plot title.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/372Report OLS results2022-03-14T10:28:14+01:00Ghost UserReport OLS resultsExport `results.summary()` from ols model fit.Export `results.summary()` from ols model fit.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/373Extraction of model class should account for inheritance.2022-03-24T10:03:58+01:00Ghost UserExtraction of model class should account for inheritance.
During reporting of an experiment, the `model class` extraction should take inheritance into account to ensure that the extracted code fully describes the implemented neural network.
During reporting of an experiment, the `model class` extraction should take inheritance into account to ensure that the extracted code fully describes the implemented neural network.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/383feature importance along time axis2023-06-19T12:28:45+02:00Ghost Userfeature importance along time axisMaybe it would be great to have another feature importance approach by bootstrapping along the temporal axis. This could give some insights which time step has which influence. Only applicable if either timesteps are same for all inputs ...Maybe it would be great to have another feature importance approach by bootstrapping along the temporal axis. This could give some insights which time step has which influence. Only applicable if either timesteps are same for all inputs or if labels are available to indicate same time steps (e.g. if using L-shape input data). Think more elaborated about this idea and how to implement.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/384AQW data handler2022-07-11T10:53:07+02:00Ghost UserAQW data handler# Data Handler for AQWatch
*data handler designed for work of @li40*
## Structure of Data
**Inputs**: forecasts from CTMs
* root folder: `mod`
* structured per region of interest (each will be a separate experiment with a different NN...# Data Handler for AQWatch
*data handler designed for work of @li40*
## Structure of Data
**Inputs**: forecasts from CTMs
* root folder: `mod`
* structured per region of interest (each will be a separate experiment with a different NN), e.g. `/colorado`
* depending on region: different number and names of CTMs and always mean of ensemble *(not used as input for now)*, e.g. `/lotos_tno`
* data are already interpolated on station level
* data are stored per forecast date `/$YYYYMMDD`
* single file per species, e.g. `$nvar_TNO_inp.nc`
* data files are structured as follows: first dimension is time (model time UTC), second dimension is station (named by station long name).
**Targets**: observations from measurement stations
* root folder: `/obs/obs_download_scripts`
* structured per region of interest (each will be a separate experiment with a different NN), e.g. `/Colorado`
* data are inside data folder `/Data`
* single file per species and date (including all stations and 24h), e.g. `obs_$nvar_$YYYYMMDD.nc`
* data files are structured as follows: index is date_utc, columns are stations indicated by id (size is timesteps x stations).
**Competitor**: ensemble mean calculated over all CTM forecasts
* root folder: `mod`
* structured per region of interest (each will be a separate experiment with a different NN), e.g. `/colorado`
* stored in directory `/ens`
* structure same as for inputs
```
|-- mod
| `-- colorado
| |-- ens (MEAN of CTM#1-3)
| | `-- $YYYYMMDD (%forecast_date)
| | `-- interpolated
| | `-- $nvar_ensmean_inp.nc # time series of ensemble mean (competitor)
| |-- lotos_tno (CTM#1)
| | |-- $YYYYMMDD (%forecast_date)
| | | `-- interpolated
| | | `-- $nvar_TNO_inp.nc # time series of CTM#1 (input feature)
| | `-- stations_colo.csv # Station info
| |-- silam_fmi (CTM#2)
| | |-- $YYYYMMDD (%forecast_date)
| | | `-- interpolated
| | | `-- $nvar_FMI_inp.nc # time series of CTM#2 (input feature)
| | `-- stations_colo.csv
| `-- wrf_ucar (CTM#3)
| |-- $YYYYMMDD (%forecast_date)
| | `-- interpolated
| | `-- $nvar_UCAR_inp.nc # time series of CTM#3 (input feature)
| `-- stations_colo.csv
`-- obs
`-- obs_download_scripts
`-- Colorado
|-- stations_colo.csv
`-- Data
`--obs_$nvar_$YYYYMMDD.nc (%obs_date) # time series of observation (target)
$nvar = ['co','so2','no2','o3','pm10','pm25'] # pollutant species available
%forecast_date = current day
%obs_date = one day before current day
```
## Start Script
This is a basic script that could be used for the AQWatch data handler. The script does not setup the NN explicitly but can be used to check if the workflow passes.
```python
__author__ = "Lukas Leufen"
__date__ = '2022-05-18'
import argparse
import os
import sys
sys.path.append("<abs_path_to_mlair>")
from mlair.workflows import DefaultWorkflow
from mlair.data_handler.data_handler_aqwatch import DataHandlerAQWatch, DataHandlerAQWatchSingleStation
def main(parser_args):
args = dict(data_handler=DataHandlerAQWatch,
interpolation_limit=3, overwrite_local_data=False,
overwrite_lazy_data=True,
lazy_preprocessing=True,
train_min_length=0, # just replace defaults which are 90
val_min_length=0, # just replace defaults which are 90
test_min_length=0, # just replace defaults which are 90
window_history_size=0, # has to be 0 to indicate t0
window_lead_time=0, # has to be 0 to indicate t0
start="2022-05-01", # start and train_start should be the same
train_start="2022-05-01",
train_end="2022-05-02",
val_start="2022-05-02",
val_end="2022-05-10",
test_start="2022-05-10",
test_end="2022-05-20",
end="2022-05-20", # end and test_end should be the same
region="colorado", # specify the region
variables=["no2"], # this sets your variable, currently it is not possible to use more than one
target_var=["no2"], # this sets your variable, currently it is not possible to use more than one
ctm_list=["test_ctm", "test2_ctm"], # name models to use
competitors=["aqw_ens_mean"],
sampling="hourly",
#stations=['80050006', '80131001', '80130014', '80350004', '80410017', '80410015', '80410013',
# '80830006', '80310002', '80310013', '80691004', '80690011', '80690009', '80310028',
# '80590011', '80519991', '80770017', '81230009', '81230006', '80050002', '80310027',
# '80310026', '80130003', '80410016', '80830101', '80770020', '81030006', '80450012',
# '80450007', '80590006', '80690007', '80699991', '80677001', '80677003', '80013001',
# '80970008'],
stations=["Aurora East", "Boulder Reservoir","Chatfield Park - 11500 N. Roxborough Park Rd.","Colorado Springs - USAF Academy","Cortez Ozone","Denver - CAMP - 2105 Broadway","Fort Collins - CSU - 708 S. Mason St.","Fort Collins - West - Laporte Ave. & Overland Tr.","Golden - NREL - 2054 Quaker St.","Gothic","Greeley - Weld Co. Tower - 3101 35th Ave.","Highland Reservoir - 8100 S. University Blvd.","La Casa NCORE - 4545 Navajo St.","Manitou Springs","Mesa Verde NP","Palisade Ozone","Rangely, CO","Rifle Ozone","Rocky Flats - N - 16600 W. Colo. Hwy. 128","Rocky Mountain NP","Ute 1","Ute 3","Welby - 78th Ave. & Steele St."],
transformation={
"o3": {"method": "log"},
"no": {"method": "log"},
"no2": {"method": "log"}, },
data_path=os.path.join(".", "data", "aqw_data"), # <- root folder of data containing obs and mod
**parser_args.__dict__,
)
workflow = DefaultWorkflow(**args, start_script=__file__)
workflow.run()
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--experiment_date', metavar='--exp_date', type=str, default=None,
help="set experiment date as string")
args = parser.parse_args(["--experiment_date", "testrun"])
main(args)
```
## TODOs
* [ ] include competitor: maybe write a class that can load the AQWatch data (and stores them), similar to the IntelliO3 competitor
* [ ] check if workflow works from begin to end
* [ ] be able to load meta data as lon/lat from meta files as `station_colo.csv`https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/411Include postprocessing for BNNs2022-08-10T13:31:21+02:00Ghost UserInclude postprocessing for BNNsExtend post-processing module to work with BNNs
- [x] create ens. predictions #412
- [x] extract learned distribution parameters #412
- [ ] Include CRPS analysis
- [ ] extend time-series plots with uncertainties etc.Extend post-processing module to work with BNNs
- [x] create ens. predictions #412
- [x] extract learned distribution parameters #412
- [ ] Include CRPS analysis
- [ ] extend time-series plots with uncertainties etc.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/412Create ens. predictions for BNNs2022-08-09T14:45:29+02:00Ghost UserCreate ens. predictions for BNNspreparation for #411preparation for #411https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/414Include CRPS analysis and other ens verif methods or plots2022-08-10T13:34:17+02:00Ghost UserInclude CRPS analysis and other ens verif methods or plotsInclude CRPS analysis and other ens verif methods or plots.
make use of https://pypi.org/project/ensverif/ or related
Contributes to #411Include CRPS analysis and other ens verif methods or plots.
make use of https://pypi.org/project/ensverif/ or related
Contributes to #411https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/420DataHandler with multiple stats per variable2022-08-24T15:16:12+02:00Ghost UserDataHandler with multiple stats per variableImplement new strategy to interoperate one variable with multiple statistics. For example: `stats_per_var = {'o3': ['dma8eu', 'perc95', 'perc05'], 'relhum': 'average_values'}`
- [x] upload downloads
- [x] reset names with statistic name...Implement new strategy to interoperate one variable with multiple statistics. For example: `stats_per_var = {'o3': ['dma8eu', 'perc95', 'perc05'], 'relhum': 'average_values'}`
- [x] upload downloads
- [x] reset names with statistic name (e.g. o3_dma8eu, o3_perc95, o3_perc05)
- [x] update variable list in DataHander with updated names
- [ ] implement statistics selector for target variablehttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/437combined plot of ME, MSE and R2023-06-19T12:28:50+02:00Ghost Usercombined plot of ME, MSE and RMaybe create another plot version that combines ME, MSE, and R in single graph. Can only display single value (e.g. overall average MSE, ...) and not entire distribution
![grafik](/uploads/a85c337feedc44353ea1b5d186e545c4/grafik.png)Maybe create another plot version that combines ME, MSE, and R in single graph. Can only display single value (e.g. overall average MSE, ...) and not entire distribution
![grafik](/uploads/a85c337feedc44353ea1b5d186e545c4/grafik.png)https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/438error plot per month2023-06-19T12:28:55+02:00Ghost Usererror plot per monthCreate plot in similar manner to line plot version of `PlotTimeEvolutionMetric` that highlights error for each month summarized on all years in test set.Create plot in similar manner to line plot version of `PlotTimeEvolutionMetric` that highlights error for each month summarized on all years in test set.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/439other error metrics line plot2023-06-19T12:28:59+02:00Ghost Userother error metrics line plotCreate plot in similar manner to line plot version of `PlotTimeEvolutionMetric` but for other metrics than MSE. In particular, bias/mean error would be interesting.Create plot in similar manner to line plot version of `PlotTimeEvolutionMetric` but for other metrics than MSE. In particular, bias/mean error would be interesting.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/443Time Series Plot for all Competitors2023-02-06T10:15:48+01:00Ghost UserTime Series Plot for all Competitors* [ ] Add separate plot for each competitor
* [ ] also plot all models in single plot (with different shaded colors)* [ ] Add separate plot for each competitor
* [ ] also plot all models in single plot (with different shaded colors)https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/455Skip harmonize history and target on demand2023-06-30T11:43:02+02:00Ghost UserSkip harmonize history and target on demandTo issue a real-time forecast, it is required to not harmonize history and target data as target data is not available at this time.
* [ ] add a parameter that stores unharmonized history data in separate variable `self.full_history`.
*...To issue a real-time forecast, it is required to not harmonize history and target data as target data is not available at this time.
* [ ] add a parameter that stores unharmonized history data in separate variable `self.full_history`.
* [ ] add method that forecasts also on full_history parameter and stores forecasts as `forecast_full.nc`https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/456add different example run scripts2023-06-30T11:42:47+02:00Ghost Useradd different example run scriptsAdd a number of different example run scripts.
* [ ] run climate fir
* [ ] run IFS forecast
* [ ] ?Add a number of different example run scripts.
* [ ] run climate fir
* [ ] run IFS forecast
* [ ] ?https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/459Preprocessing German stations2023-11-30T11:35:20+01:00Michael LangguthPreprocessing German stationsPreprocess data (i.e. generation of transformation- and apriori-data) for all German stations (rural, suburban **and** urban stations) for DestinE-AQ use case.
For this purpose, a revised list of stations is parsed and the filtering of ...Preprocess data (i.e. generation of transformation- and apriori-data) for all German stations (rural, suburban **and** urban stations) for DestinE-AQ use case.
For this purpose, a revised list of stations is parsed and the filtering of NOx data is deactivated.