MLAir issueshttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues2023-11-30T11:35:20+01:00https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/459Preprocessing German stations2023-11-30T11:35:20+01:00Michael LangguthPreprocessing German stationsPreprocess data (i.e. generation of transformation- and apriori-data) for all German stations (rural, suburban **and** urban stations) for DestinE-AQ use case.
For this purpose, a revised list of stations is parsed and the filtering of ...Preprocess data (i.e. generation of transformation- and apriori-data) for all German stations (rural, suburban **and** urban stations) for DestinE-AQ use case.
For this purpose, a revised list of stations is parsed and the filtering of NOx data is deactivated.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/456add different example run scripts2023-06-30T11:42:47+02:00Ghost Useradd different example run scriptsAdd a number of different example run scripts.
* [ ] run climate fir
* [ ] run IFS forecast
* [ ] ?Add a number of different example run scripts.
* [ ] run climate fir
* [ ] run IFS forecast
* [ ] ?https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/455Skip harmonize history and target on demand2023-06-30T11:43:02+02:00Ghost UserSkip harmonize history and target on demandTo issue a real-time forecast, it is required to not harmonize history and target data as target data is not available at this time.
* [ ] add a parameter that stores unharmonized history data in separate variable `self.full_history`.
*...To issue a real-time forecast, it is required to not harmonize history and target data as target data is not available at this time.
* [ ] add a parameter that stores unharmonized history data in separate variable `self.full_history`.
* [ ] add method that forecasts also on full_history parameter and stores forecasts as `forecast_full.nc`https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/446improve prediction speed2023-06-19T12:27:59+02:00Ghost Userimprove prediction speed
Improve prediction speed of DL model by indicating batch size in `.predict` call.
https://www.tensorflow.org/api_docs/python/tf/keras/Model#predict
```python
def make_prediction(self, subset):
"""
Create predictio...
Improve prediction speed of DL model by indicating batch size in `.predict` call.
https://www.tensorflow.org/api_docs/python/tf/keras/Model#predict
```python
def make_prediction(self, subset):
"""
Create predictions for NN, OLS, and persistence and add true observation as reference.
Predictions are filled in an array with full index range. Therefore, predictions can have missing values. All
predictions for a single station are stored locally under `<forecast/forecast_norm>_<station>_test.nc` and can
be found inside `forecast_path`.
"""
subset_type = subset.name
logging.info(f"start make_prediction for {subset_type}")
time_dimension = self.data_store.get("time_dim")
window_dim = self.data_store.get("window_dim")
for i, data in enumerate(subset):
input_data = data.get_X()
target_data = data.get_Y(as_numpy=False)
observation_data = data.get_observation()
# get scaling parameters
transformation_func = data.apply_transformation
nn_output = self.model.predict(input_data)
```https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/443Time Series Plot for all Competitors2023-02-06T10:15:48+01:00Ghost UserTime Series Plot for all Competitors* [ ] Add separate plot for each competitor
* [ ] also plot all models in single plot (with different shaded colors)* [ ] Add separate plot for each competitor
* [ ] also plot all models in single plot (with different shaded colors)https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/439other error metrics line plot2023-06-19T12:28:59+02:00Ghost Userother error metrics line plotCreate plot in similar manner to line plot version of `PlotTimeEvolutionMetric` but for other metrics than MSE. In particular, bias/mean error would be interesting.Create plot in similar manner to line plot version of `PlotTimeEvolutionMetric` but for other metrics than MSE. In particular, bias/mean error would be interesting.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/438error plot per month2023-06-19T12:28:55+02:00Ghost Usererror plot per monthCreate plot in similar manner to line plot version of `PlotTimeEvolutionMetric` that highlights error for each month summarized on all years in test set.Create plot in similar manner to line plot version of `PlotTimeEvolutionMetric` that highlights error for each month summarized on all years in test set.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/437combined plot of ME, MSE and R2023-06-19T12:28:50+02:00Ghost Usercombined plot of ME, MSE and RMaybe create another plot version that combines ME, MSE, and R in single graph. Can only display single value (e.g. overall average MSE, ...) and not entire distribution
![grafik](/uploads/a85c337feedc44353ea1b5d186e545c4/grafik.png)Maybe create another plot version that combines ME, MSE, and R in single graph. Can only display single value (e.g. overall average MSE, ...) and not entire distribution
![grafik](/uploads/a85c337feedc44353ea1b5d186e545c4/grafik.png)https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/433Contingency Analysis2023-06-27T14:53:57+02:00Ghost UserContingency AnalysisImplement some kind of contingency analysis:
* [ ] add hit rate and the other metrics
* [ ] for dma8eu ozone: calculate metrics for legal limits
* [ ] calculate percent hits of the top 2,5,10%: Does a model/competitor exceed its own X% ...Implement some kind of contingency analysis:
* [ ] add hit rate and the other metrics
* [ ] for dma8eu ozone: calculate metrics for legal limits
* [ ] calculate percent hits of the top 2,5,10%: Does a model/competitor exceed its own X% threshold if this is the case for the observation?https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/429use model display name for latex reports2023-06-19T12:28:14+02:00Ghost Useruse model display name for latex reportsuse model display name for latex reports instead of using model indicator (nn)use model display name for latex reports instead of using model indicator (nn)https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/420DataHandler with multiple stats per variable2022-08-24T15:16:12+02:00Ghost UserDataHandler with multiple stats per variableImplement new strategy to interoperate one variable with multiple statistics. For example: `stats_per_var = {'o3': ['dma8eu', 'perc95', 'perc05'], 'relhum': 'average_values'}`
- [x] upload downloads
- [x] reset names with statistic name...Implement new strategy to interoperate one variable with multiple statistics. For example: `stats_per_var = {'o3': ['dma8eu', 'perc95', 'perc05'], 'relhum': 'average_values'}`
- [x] upload downloads
- [x] reset names with statistic name (e.g. o3_dma8eu, o3_perc95, o3_perc05)
- [x] update variable list in DataHander with updated names
- [ ] implement statistics selector for target variablehttps://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/414Include CRPS analysis and other ens verif methods or plots2022-08-10T13:34:17+02:00Ghost UserInclude CRPS analysis and other ens verif methods or plotsInclude CRPS analysis and other ens verif methods or plots.
make use of https://pypi.org/project/ensverif/ or related
Contributes to #411Include CRPS analysis and other ens verif methods or plots.
make use of https://pypi.org/project/ensverif/ or related
Contributes to #411https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/412Create ens. predictions for BNNs2022-08-09T14:45:29+02:00Ghost UserCreate ens. predictions for BNNspreparation for #411preparation for #411https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/411Include postprocessing for BNNs2022-08-10T13:31:21+02:00Ghost UserInclude postprocessing for BNNsExtend post-processing module to work with BNNs
- [x] create ens. predictions #412
- [x] extract learned distribution parameters #412
- [ ] Include CRPS analysis
- [ ] extend time-series plots with uncertainties etc.Extend post-processing module to work with BNNs
- [x] create ens. predictions #412
- [x] extract learned distribution parameters #412
- [ ] Include CRPS analysis
- [ ] extend time-series plots with uncertainties etc.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/392REFAC: Rename lr to learning rate2022-08-31T10:38:04+02:00Ghost UserREFAC: Rename lr to learning rateRename `lr` with `learning_rate` as this is deprecated.
```
/usr/lib64/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:375: UserWarning: The `lr` argument is deprecated, use `learning_rate` instead.
"Th...Rename `lr` with `learning_rate` as this is deprecated.
```
/usr/lib64/python3.6/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:375: UserWarning: The `lr` argument is deprecated, use `learning_rate` instead.
"The `lr` argument is deprecated, use `learning_rate` instead.")
```https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/386ClimateFIR customize agg for apriori2023-06-19T12:28:40+02:00Ghost UserClimateFIR customize agg for apriori# Customize Apriori
Current status: When applying ClimateFIR, apriori is calculated always on means (e.g. monthly mean).
IDEA: As especially for chemicals the span of values might spread a lot, one could consider to use the more robus...# Customize Apriori
Current status: When applying ClimateFIR, apriori is calculated always on means (e.g. monthly mean).
IDEA: As especially for chemicals the span of values might spread a lot, one could consider to use the more robust median to replace the average. As can be seen in the example, the estimate is not that good.
![example_mean_median](/uploads/ac85c390eb1bee2de5e808be5cd95759/example_mean_median.png)
In the next example, one can see that in some cases the median is much lower than the mean and very close to the lower percentile. This indicates, that some "bigger" values increase the mean while in most cases concentration is low.
JANUARY
![example_percentile](/uploads/087a3ce129ba323863133fb63cf5d0ee/example_percentile.png)
In summer, there is less this issue:
AUGUST
![example_percentile_aug](/uploads/88b02cd4c5eb6f68a8890e1e8eef49cc/example_percentile_aug.png)
## TODOs
* [ ] be able to use given statistic (from `mean`, `median`, maybe think about quantiles)https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/384AQW data handler2022-07-11T10:53:07+02:00Ghost UserAQW data handler# Data Handler for AQWatch
*data handler designed for work of @li40*
## Structure of Data
**Inputs**: forecasts from CTMs
* root folder: `mod`
* structured per region of interest (each will be a separate experiment with a different NN...# Data Handler for AQWatch
*data handler designed for work of @li40*
## Structure of Data
**Inputs**: forecasts from CTMs
* root folder: `mod`
* structured per region of interest (each will be a separate experiment with a different NN), e.g. `/colorado`
* depending on region: different number and names of CTMs and always mean of ensemble *(not used as input for now)*, e.g. `/lotos_tno`
* data are already interpolated on station level
* data are stored per forecast date `/$YYYYMMDD`
* single file per species, e.g. `$nvar_TNO_inp.nc`
* data files are structured as follows: first dimension is time (model time UTC), second dimension is station (named by station long name).
**Targets**: observations from measurement stations
* root folder: `/obs/obs_download_scripts`
* structured per region of interest (each will be a separate experiment with a different NN), e.g. `/Colorado`
* data are inside data folder `/Data`
* single file per species and date (including all stations and 24h), e.g. `obs_$nvar_$YYYYMMDD.nc`
* data files are structured as follows: index is date_utc, columns are stations indicated by id (size is timesteps x stations).
**Competitor**: ensemble mean calculated over all CTM forecasts
* root folder: `mod`
* structured per region of interest (each will be a separate experiment with a different NN), e.g. `/colorado`
* stored in directory `/ens`
* structure same as for inputs
```
|-- mod
| `-- colorado
| |-- ens (MEAN of CTM#1-3)
| | `-- $YYYYMMDD (%forecast_date)
| | `-- interpolated
| | `-- $nvar_ensmean_inp.nc # time series of ensemble mean (competitor)
| |-- lotos_tno (CTM#1)
| | |-- $YYYYMMDD (%forecast_date)
| | | `-- interpolated
| | | `-- $nvar_TNO_inp.nc # time series of CTM#1 (input feature)
| | `-- stations_colo.csv # Station info
| |-- silam_fmi (CTM#2)
| | |-- $YYYYMMDD (%forecast_date)
| | | `-- interpolated
| | | `-- $nvar_FMI_inp.nc # time series of CTM#2 (input feature)
| | `-- stations_colo.csv
| `-- wrf_ucar (CTM#3)
| |-- $YYYYMMDD (%forecast_date)
| | `-- interpolated
| | `-- $nvar_UCAR_inp.nc # time series of CTM#3 (input feature)
| `-- stations_colo.csv
`-- obs
`-- obs_download_scripts
`-- Colorado
|-- stations_colo.csv
`-- Data
`--obs_$nvar_$YYYYMMDD.nc (%obs_date) # time series of observation (target)
$nvar = ['co','so2','no2','o3','pm10','pm25'] # pollutant species available
%forecast_date = current day
%obs_date = one day before current day
```
## Start Script
This is a basic script that could be used for the AQWatch data handler. The script does not setup the NN explicitly but can be used to check if the workflow passes.
```python
__author__ = "Lukas Leufen"
__date__ = '2022-05-18'
import argparse
import os
import sys
sys.path.append("<abs_path_to_mlair>")
from mlair.workflows import DefaultWorkflow
from mlair.data_handler.data_handler_aqwatch import DataHandlerAQWatch, DataHandlerAQWatchSingleStation
def main(parser_args):
args = dict(data_handler=DataHandlerAQWatch,
interpolation_limit=3, overwrite_local_data=False,
overwrite_lazy_data=True,
lazy_preprocessing=True,
train_min_length=0, # just replace defaults which are 90
val_min_length=0, # just replace defaults which are 90
test_min_length=0, # just replace defaults which are 90
window_history_size=0, # has to be 0 to indicate t0
window_lead_time=0, # has to be 0 to indicate t0
start="2022-05-01", # start and train_start should be the same
train_start="2022-05-01",
train_end="2022-05-02",
val_start="2022-05-02",
val_end="2022-05-10",
test_start="2022-05-10",
test_end="2022-05-20",
end="2022-05-20", # end and test_end should be the same
region="colorado", # specify the region
variables=["no2"], # this sets your variable, currently it is not possible to use more than one
target_var=["no2"], # this sets your variable, currently it is not possible to use more than one
ctm_list=["test_ctm", "test2_ctm"], # name models to use
competitors=["aqw_ens_mean"],
sampling="hourly",
#stations=['80050006', '80131001', '80130014', '80350004', '80410017', '80410015', '80410013',
# '80830006', '80310002', '80310013', '80691004', '80690011', '80690009', '80310028',
# '80590011', '80519991', '80770017', '81230009', '81230006', '80050002', '80310027',
# '80310026', '80130003', '80410016', '80830101', '80770020', '81030006', '80450012',
# '80450007', '80590006', '80690007', '80699991', '80677001', '80677003', '80013001',
# '80970008'],
stations=["Aurora East", "Boulder Reservoir","Chatfield Park - 11500 N. Roxborough Park Rd.","Colorado Springs - USAF Academy","Cortez Ozone","Denver - CAMP - 2105 Broadway","Fort Collins - CSU - 708 S. Mason St.","Fort Collins - West - Laporte Ave. & Overland Tr.","Golden - NREL - 2054 Quaker St.","Gothic","Greeley - Weld Co. Tower - 3101 35th Ave.","Highland Reservoir - 8100 S. University Blvd.","La Casa NCORE - 4545 Navajo St.","Manitou Springs","Mesa Verde NP","Palisade Ozone","Rangely, CO","Rifle Ozone","Rocky Flats - N - 16600 W. Colo. Hwy. 128","Rocky Mountain NP","Ute 1","Ute 3","Welby - 78th Ave. & Steele St."],
transformation={
"o3": {"method": "log"},
"no": {"method": "log"},
"no2": {"method": "log"}, },
data_path=os.path.join(".", "data", "aqw_data"), # <- root folder of data containing obs and mod
**parser_args.__dict__,
)
workflow = DefaultWorkflow(**args, start_script=__file__)
workflow.run()
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('--experiment_date', metavar='--exp_date', type=str, default=None,
help="set experiment date as string")
args = parser.parse_args(["--experiment_date", "testrun"])
main(args)
```
## TODOs
* [ ] include competitor: maybe write a class that can load the AQWatch data (and stores them), similar to the IntelliO3 competitor
* [ ] check if workflow works from begin to end
* [ ] be able to load meta data as lon/lat from meta files as `station_colo.csv`https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/383feature importance along time axis2023-06-19T12:28:45+02:00Ghost Userfeature importance along time axisMaybe it would be great to have another feature importance approach by bootstrapping along the temporal axis. This could give some insights which time step has which influence. Only applicable if either timesteps are same for all inputs ...Maybe it would be great to have another feature importance approach by bootstrapping along the temporal axis. This could give some insights which time step has which influence. Only applicable if either timesteps are same for all inputs or if labels are available to indicate same time steps (e.g. if using L-shape input data). Think more elaborated about this idea and how to implement.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/377display shuffle count in separate box2022-07-19T12:16:51+02:00Ghost Userdisplay shuffle count in separate boxHighlight the shuffle count of bootstraps in feature importance analysis in separate box as done for the uncertainty estimate. Currently, this information is added in bracket in the plot title.Highlight the shuffle count of bootstraps in feature importance analysis in separate box as done for the uncertainty estimate. Currently, this information is added in bracket in the plot title.https://gitlab.jsc.fz-juelich.de/esde/machine-learning/mlair/-/issues/373Extraction of model class should account for inheritance.2022-03-24T10:03:58+01:00Ghost UserExtraction of model class should account for inheritance.
During reporting of an experiment, the `model class` extraction should take inheritance into account to ensure that the extracted code fully describes the implemented neural network.
During reporting of an experiment, the `model class` extraction should take inheritance into account to ensure that the extracted code fully describes the implemented neural network.