MLAir issueshttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues2023-11-30T11:35:20+01:00https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/459Preprocessing German stations2023-11-30T11:35:20+01:00Michael LangguthPreprocessing German stationsPreprocess data (i.e. generation of transformation- and apriori-data) for all German stations (rural, suburban **and** urban stations) for DestinE-AQ use case.
For this purpose, a revised list of stations is parsed and the filtering of ...Preprocess data (i.e. generation of transformation- and apriori-data) for all German stations (rural, suburban **and** urban stations) for DestinE-AQ use case.
For this purpose, a revised list of stations is parsed and the filtering of NOx data is deactivated.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/458release v2.4.02023-06-30T11:36:58+02:00Ghost Userrelease v2.4.0<!-- Use this template for a new release of MLAir. -->
# Release
<!-- add your release version here -->
v2.4.0
## checklist
* [x] Create Release Issue
* [x] Create merge request: branch `release_v2.4.0` into `master`
* [x] Merge `dev...<!-- Use this template for a new release of MLAir. -->
# Release
<!-- add your release version here -->
v2.4.0
## checklist
* [x] Create Release Issue
* [x] Create merge request: branch `release_v2.4.0` into `master`
* [x] Merge `develop` into `release_v2.4.0`
* [x] Checkout `release_v2.4.0`
* [x] Adjust `changelog.md` (see template for changelog)
* [x] Update version number in `mlair/__ init__.py`
* [x] Create new dist file: `python3 setup.py sdist bdist_wheel`
* [ ] Add new dist file `mlair-2.4.0-py3-none-any.whl` to git
* [x] Update file link `distribution file (current version)` in `README.md`
* [x] Update file link in `docs/_source/installation.rst`
* [x] Commit + push
* [ ] Merge `release_v2.4.0` into `master`
* [ ] Create new tag with
* [ ] distribution file (.whl)
* [ ] link to Documentation
* [ ] Example Jupyter Notebook
* [ ] changelog
## template for changelog
<!-- use this structure for the changelog. Link all issue to at least one item. -->
```
## v2.4.0 - 2023-06-30- <release description>
### general:
* text
### new features:
* words (issue)
### technical:
*
```https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/457set config paths as parameter2023-06-30T11:42:24+02:00Ghost Userset config paths as parameteradd parameters to set data paths of ifs or cams from outside. Use config files only if parameter is not providedadd parameters to set data paths of ifs or cams from outside. Use config files only if parameter is not providedhttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/456add different example run scripts2023-06-30T11:42:47+02:00Ghost Useradd different example run scriptsAdd a number of different example run scripts.
* [ ] run climate fir
* [ ] run IFS forecast
* [ ] ?Add a number of different example run scripts.
* [ ] run climate fir
* [ ] run IFS forecast
* [ ] ?https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/455Skip harmonize history and target on demand2023-06-30T11:43:02+02:00Ghost UserSkip harmonize history and target on demandTo issue a real-time forecast, it is required to not harmonize history and target data as target data is not available at this time.
* [ ] add a parameter that stores unharmonized history data in separate variable `self.full_history`.
*...To issue a real-time forecast, it is required to not harmonize history and target data as target data is not available at this time.
* [ ] add a parameter that stores unharmonized history data in separate variable `self.full_history`.
* [ ] add method that forecasts also on full_history parameter and stores forecasts as `forecast_full.nc`https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/454Use Toar statistics api v22023-06-30T11:42:01+02:00Ghost UserUse Toar statistics api v2As Toar Statistics API v1 is offline, adjust code to load aggregated data (like dma8eu ozone) from new API v2.
When the old code looks like
```python
from io import BytesIO
import pandas as pd
import requests
resp = requests.get("ht...As Toar Statistics API v1 is offline, adjust code to load aggregated data (like dma8eu ozone) from new API v2.
When the old code looks like
```python
from io import BytesIO
import pandas as pd
import requests
resp = requests.get("https://toar-data.fz-juelich.de/statistics/api/v1/?format=csv×eries_id=31099&names=dma8eu&sampling=daily")
df = pd.read_csv(BytesIO(resp.content), index_col="datetime", parse_dates=True)
```
the new code could look like
```python
from io import BytesIO
import zipfile
import pandas as pd
import requests
resp = requests.get("https://toar-data.fz-juelich.de/api/v2/analysis/statistics/?sampling=daily&statistics=dma8eu&id=31099")
while True:
resp = requests.get(resp.json()["status"], timeout=(3.05, 5))
if resp.history:
break
with zipfile.ZipFile(BytesIO(resp.content)) as file:
df = pd.read_csv(BytesIO(file.read("31099_dma8eu.csv")), comment="#", index_col="datetime", parse_dates=True)
```https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/453advanced retry strategy2023-06-30T11:42:07+02:00Ghost Useradvanced retry strategyImplement advanced retry strategy when downloading data. Currently, only the `get` command is in charge of retry, but not building up a renewed connection. As toardb sometimes killes jobs without an error response, the connection is brok...Implement advanced retry strategy when downloading data. Currently, only the `get` command is in charge of retry, but not building up a renewed connection. As toardb sometimes killes jobs without an error response, the connection is broken without recognition. Therefore, adjust the retry strategy to establish a new connection instead of only retrying to get data.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/452update proj version2023-06-30T11:42:21+02:00Ghost Userupdate proj version<!-- Use this template for a bug in MLAir. -->
# Bug
## Error description
<!-- Provide a context when the bug / error arises -->
Pipeline tests from scratch are failing
## Error message
<!-- Provide the error log if available -->
``...<!-- Use this template for a bug in MLAir. -->
# Bug
## Error description
<!-- Provide a context when the bug / error arises -->
Pipeline tests from scratch are failing
## Error message
<!-- Provide the error log if available -->
```shell
$ zypper --no-gpg-checks --non-interactive install proj=9.1.0
Loading repository data...
Reading installed packages...
No provider of 'proj=9.1.0' found.
'proj=9.1.0' not found in package names. Trying capabilities.
```
## First guess on error origin
<!-- Add first ideas where the error could come from -->
* updated proj version -> 9.2.0
* try to remove version at all first
* if this is not working, fix to new version
## Error origin
<!-- Fill this up when the bug / error origin has been found -->
## Solution
<!-- Short description how to solve the error -->
Remove all the package versions.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/451robust apriori estimate for short timeseries2023-06-30T11:42:19+02:00Ghost Userrobust apriori estimate for short timeseriesWhen time series are shorter than 1 year, there are issue with calculcating climate stats resulting in NaN information.
* [x] add `dropna` along time axis in `filter.py:create_monthly_mean`
* [x] also check `filter.py:create_seasonal_ho...When time series are shorter than 1 year, there are issue with calculcating climate stats resulting in NaN information.
* [x] add `dropna` along time axis in `filter.py:create_monthly_mean`
* [x] also check `filter.py:create_seasonal_hourly_mean` if this behaviour happens therehttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/450load ifs data2023-06-30T11:42:04+02:00Ghost Userload ifs dataImplement a data loader that can load locally stored IFS forecast data. Should look similar to existing era5 data loader.
* [x] be able to load IFS data
* [x] trigger IFS loading by using `ifs` in `data_origin`
# Design choices
* IFS ...Implement a data loader that can load locally stored IFS forecast data. Should look similar to existing era5 data loader.
* [x] be able to load IFS data
* [x] trigger IFS loading by using `ifs` in `data_origin`
# Design choices
* IFS data contain two temporal axes: init time (every 12 hours), valid time (hourly)
* for now: create a single time series consisting on the closest combination of init and valid time (use only t0+0h to t0+11h of each init time).
* for future: think about information for ti>t0 of each sample. Maybe use rather init time of t0 and all forecast steps for future time steps. This will break with a single timeseries, but this is similar to the filter approach and prevents data leakage. Maybe this should be part of another issue.
## discussions
Usage of operational forecast data poses some problems with the current setup of MLAir. For now, raw time series always contained a single time dimension which then is transformed into two during sample setup. Now, this second dimension already exists from the NWP model's lead time. So all methods implemented for now cannot handle this data (interpolation, filter, ...).
Changing the general behaviour, e.g. always adding window dimension 0, is a huge refactoring step.
Designing a new data handler just for IFS data harms compability with all other data handlers and produced a lot of almost duplicated code.
Changing the filter calculation methods might therefore be the simplest solution.
## TODO
* [x] refac: expand dims in all data loader (era5_local, join) by dimension `window` so that data can be merged with ifs data. Window dimension has single entry `0`. If final dataframe has only the 0 dimension, remove this dimension again. If no IFS data are loaded, returned data are like before!
* [x] implement/adjust: ClimateFIR filter should be able to use data with two time dimensions as input. As after the first filter iteration, data is already structured with two time dimensions, it should be possible to use such data from the beginning.
* [x] new data handler for IFS data? Skip interpolation (or apply later after data is resturctured?), create time series data for each init time (ti<t0: closest combination of init and valid time, ti>=t0: most recent forecast, be aware of running time 01 and 13 local time), maybe interpolate now, calculate filter.Michael LangguthMichael Langguthhttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/449load era5 data from toar db2023-06-30T11:42:09+02:00Ghost Userload era5 data from toar dbToar DB does now include era5 data. Therefore, replace current bypass using local data.
* [x] check that `era5` can be used as data origin to load from ToarDB
* [x] trigger former era5 loader only with flag `era5_local` (do not completl...Toar DB does now include era5 data. Therefore, replace current bypass using local data.
* [x] check that `era5` can be used as data origin to load from ToarDB
* [x] trigger former era5 loader only with flag `era5_local` (do not completly remove this functionality)https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/448load model from path2023-06-30T11:42:15+02:00Ghost Userload model from pathCurrently, MLAir expects a trained model to be inside the experiment path inside the `model` directory. When running a fresh experiment run with no training at all, this will fail as there is no option to refer to an existing model (with...Currently, MLAir expects a trained model to be inside the experiment path inside the `model` directory. When running a fresh experiment run with no training at all, this will fail as there is no option to refer to an existing model (without copying it into to model directory after initiating the MLAir workflow). Therefore add option to use external model.
* [x] introduce `model_path` parameter that can be set during `ExperimentSetup`
* [x] add option to copy the given model inside the experiment path (which should then overwrite the external `model_path`?)https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/447store and load local clim apriori data2023-06-30T11:42:16+02:00Ghost Userstore and load local clim apriori dataCurrently, apriori data to calculate filter components must be either parsed in workflow args or is calculated on data. Also, apriori is stored in lazy processingm, but is encrypted and therefore hard to read from outside. After this iss...Currently, apriori data to calculate filter components must be either parsed in workflow args or is calculated on data. Also, apriori is stored in lazy processingm, but is encrypted and therefore hard to read from outside. After this issue, it should be possible to use an experiment's apriori data in another experiment.
* [x] Store apriori data as locally, similar to transformation properties. ~Maybe only store, when `store_apriori=True`.~ Apriori is always stored.
* [x] define location and if a single file storage or multiple files are suitable: `<exp>/data/apriori`
* [x] define storage format (.nc, .np, .csv, ...?): `.pickle`
* [x] is it required to store filter information, like filter size and number of split (to ensure integrity of apriori information): not implemented
* [x] Load apriori data from local path
* [x] define parameter `apriori_file=<path/file>.pickle`
* [x] ~check for matching filter settings~ canceledhttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/446improve prediction speed2023-06-19T12:27:59+02:00Ghost Userimprove prediction speed
Improve prediction speed of DL model by indicating batch size in `.predict` call.
https://www.tensorflow.org/api_docs/python/tf/keras/Model#predict
```python
def make_prediction(self, subset):
"""
Create predictio...
Improve prediction speed of DL model by indicating batch size in `.predict` call.
https://www.tensorflow.org/api_docs/python/tf/keras/Model#predict
```python
def make_prediction(self, subset):
"""
Create predictions for NN, OLS, and persistence and add true observation as reference.
Predictions are filled in an array with full index range. Therefore, predictions can have missing values. All
predictions for a single station are stored locally under `<forecast/forecast_norm>_<station>_test.nc` and can
be found inside `forecast_path`.
"""
subset_type = subset.name
logging.info(f"start make_prediction for {subset_type}")
time_dimension = self.data_store.get("time_dim")
window_dim = self.data_store.get("window_dim")
for i, data in enumerate(subset):
input_data = data.get_X()
target_data = data.get_Y(as_numpy=False)
observation_data = data.get_observation()
# get scaling parameters
transformation_func = data.apply_transformation
nn_output = self.model.predict(input_data)
```https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/445Data Insight Plot Monthly Distribution2023-06-30T11:42:04+02:00Ghost UserData Insight Plot Monthly Distribution* [ ] implement a variant of the monthly summary plot but only with observations. Instead, use different colors/bars for each subset* [ ] implement a variant of the monthly summary plot but only with observations. Instead, use different colors/bars for each subsethttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/444choose interp method in CAMS competitor2023-06-30T11:42:10+02:00Ghost Userchoose interp method in CAMS competitor* [x] add option to set cams interpolation method (currently only using nearest neighbor)
* [x] enable to use both methods as separate competitors* [x] add option to set cams interpolation method (currently only using nearest neighbor)
* [x] enable to use both methods as separate competitorshttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/443Time Series Plot for all Competitors2023-02-06T10:15:48+01:00Ghost UserTime Series Plot for all Competitors* [ ] Add separate plot for each competitor
* [ ] also plot all models in single plot (with different shaded colors)* [ ] Add separate plot for each competitor
* [ ] also plot all models in single plot (with different shaded colors)https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/442bias free evaluation2023-06-30T11:42:12+02:00Ghost Userbias free evaluationImplement post-processing evaluation that is free of bias. This is in particular interesting when comparing DL with model data, as model data suffers from systematical deviations. A bias-free evaluation can show how much a model is able ...Implement post-processing evaluation that is free of bias. This is in particular interesting when comparing DL with model data, as model data suffers from systematical deviations. A bias-free evaluation can show how much a model is able to predict target's variance.
Therefore, implement two strategies:
(i) Calculate a total mean of a given model for each station and subtract this value from the model's forecasts.
(ii) Calculate a running mean of a given model for each station and subtract this series from model's forecast.
Each strategy is then applied to all competing models and evaluation is performed (in addition to the standard evaluation).https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/441tech: update proj version for tests from scratch2022-12-02T15:30:44+01:00Ghost Usertech: update proj version for tests from scratch<!-- Use this template for a bug in MLAir. -->
# Bug
## Error description
<!-- Provide a context when the bug / error arises -->
Job `tests (from scratch)` in CI pipeline has issue to load proj package.
## Error message
<!-- Provide t...<!-- Use this template for a bug in MLAir. -->
# Bug
## Error description
<!-- Provide a context when the bug / error arises -->
Job `tests (from scratch)` in CI pipeline has issue to load proj package.
## Error message
<!-- Provide the error log if available -->
```
$ zypper --no-gpg-checks --non-interactive install proj=8.2.1
Loading repository data...
Reading installed packages...
No provider of 'proj=8.2.1' found.
'proj=8.2.1' not found in package names. Trying capabilities.
```
## First guess on error origin
<!-- Add first ideas where the error could come from -->
There must be an update on the proj package. As for the current linux/opensuse image is no official proj package released, MLAir uses an experimental package. Maybe they updated the package and removed the old version.
## Error origin
<!-- Fill this up when the bug / error origin has been found -->
There was an update as assumed from v8.2.1 to v9.1.0! Try to update the version number in the pipeline and check if it is compatible.
## Solution
<!-- Short description how to solve the error -->https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/440release v2.3.02023-06-30T11:37:11+02:00Ghost Userrelease v2.3.0<!-- Use this template for a new release of MLAir. -->
# Release
<!-- add your release version here -->
v2.3.0
## checklist
* [x] Create Release Issue
* [x] Create merge request: branch `release_v2.3.0` into `master`
* [x] Merge `dev...<!-- Use this template for a new release of MLAir. -->
# Release
<!-- add your release version here -->
v2.3.0
## checklist
* [x] Create Release Issue
* [x] Create merge request: branch `release_v2.3.0` into `master`
* [x] Merge `develop` into `release_v2.3.0`
* [x] Checkout `release_v2.3.0`
* [x] Adjust `changelog.md` (see template for changelog)
* [x] Update version number in `mlair/__ init__.py`
* [x] Create new dist file: `python3 setup.py sdist bdist_wheel`
* [x] Add new dist file `mlair-2.3.0-py3-none-any.whl` to git
* [x] Update file link `distribution file (current version)` in `README.md`
* [x] Update file link in `docs/_source/installation.rst`
* [x] Commit + push
* [x] Merge `release_v2.3.0` into `master`
* [ ] Create new tag with
* [ ] distribution file (.whl)
* [ ] link to Documentation
* [ ] Example Jupyter Notebook
* [ ] changelog
## template for changelog
<!-- use this structure for the changelog. Link all issue to at least one item. -->
```
## v2.3.0 - 2022-11-25 - <release description>
### general:
* text
### new features:
* words (issue)
### technical:
*
```