Skip to content
GitLab
Projects
Groups
Snippets
Help
Loading...
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
M
MLAir
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
72
Issues
72
List
Boards
Labels
Service Desk
Milestones
Merge Requests
1
Merge Requests
1
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Operations
Operations
Metrics
Incidents
Environments
Packages & Registries
Packages & Registries
Container Registry
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
toar
MLAir
Commits
0d92f77e
Commit
0d92f77e
authored
Nov 04, 2020
by
lukas leufen
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
updated docs
parent
c3a22383
Pipeline
#50945
passed with stages
in 7 minutes and 51 seconds
Changes
2
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
28 additions
and
60 deletions
+28
-60
mlair/data_handler/data_handler_single_station.py
mlair/data_handler/data_handler_single_station.py
+4
-60
mlair/data_handler/default_data_handler.py
mlair/data_handler/default_data_handler.py
+24
-0
No files found.
mlair/data_handler/data_handler_single_station.py
View file @
0d92f77e
...
...
@@ -476,66 +476,10 @@ class DataHandlerSingleStation(AbstractDataHandler):
"""
Set up transformation by extracting all relevant information.
Extract all information from transformation dictionary. Possible keys are method, mean, std, min, max.
* If a transformation should be applied on base of existing values, these need to be provided in the respective
keys "mean" and "std" (again only if required for given method).
:param transformation: the transformation dictionary as described above.
:return: updated transformation dictionary
## Transformation
There are two different approaches (called scopes) to transform the data:
1) `station`: transform data for each station independently (somehow like batch normalisation)
1) `data`: transform all data of each station with shared metrics
Transformation must be set by the `transformation` attribute. If `transformation = None` is given to `ExperimentSetup`,
data is not transformed at all. For all other setups, use the following dictionary structure to specify the
transformation.
```
transformation = {"scope": <...>,
"method": <...>,
"mean": <...>,
"std": <...>}
ExperimentSetup(..., transformation=transformation, ...)
```
### scopes
**station**: mean and std are not used
**data**: either provide already calculated values for mean and std (if required by transformation method), or choose
from different calculation schemes, explained in the mean and std section.
### supported transformation methods
Currently supported methods are:
* standardise (default, if method is not given)
* centre
### mean and std
`"mean"="accurate"`: calculate the accurate values of mean and std (depending on method) by using all data. Although,
this method is accurate, it may take some time for the calculation. Furthermore, this could potentially lead to memory
issue (not explored yet, but could appear for a very big amount of data)
`"mean"="estimate"`: estimate mean and std (depending on method). For each station, mean and std are calculated and
afterwards aggregated using the mean value over all station-wise metrics. This method is less accurate, especially
regarding the std calculation but therefore much faster.
We recommend to use the later method *estimate* because of following reasons:
* much faster calculation
* real accuracy of mean and std is less important, because it is "just" a transformation / scaling
* accuracy of mean is almost as high as in the *accurate* case, because of
$
\b
ar{x_{ij}} =
\b
ar{\left(
\b
ar{x_i}
\r
ight)_j}$. The only difference is, that in the *estimate* case, each mean is
equally weighted for each station independently of the actual data count of the station.
* accuracy of std is lower for *estimate* because of $
\v
ar{x_{ij}}
\n
e
\b
ar{\left(
\v
ar{x_i}
\r
ight)_j}$, but still the mean of all
station-wise std is a decent estimate of the true std.
`"mean"=<value, e.g. xr.DataArray>`: If mean and std are already calculated or shall be set manually, just add the
scaling values instead of the calculation method. For method *centre*, std can still be None, but is required for the
*standardise* method. **Important**: Format of given values **must** match internal data format of DataPreparation
class: `xr.DataArray` with `dims=["variables"]` and one value for each variable.
* Either return new empty DataClass instances if given transformation arg is None,
* or return given object twice if transformation is a DataClass instance,
* or return the inputs and targets attributes if transformation is a TransformationClass instance (default
design behaviour)
"""
if
transformation
is
None
:
return
statistics
.
DataClass
(),
statistics
.
DataClass
()
...
...
mlair/data_handler/default_data_handler.py
View file @
0d92f77e
...
...
@@ -217,6 +217,30 @@ class DefaultDataHandler(AbstractDataHandler):
@
classmethod
def
transformation
(
cls
,
set_stations
,
**
kwargs
):
"""
### supported transformation methods
Currently supported methods are:
* standardise (default, if method is not given)
* centre
### mean and std estimation
Mean and std (depending on method) are estimated. For each station, mean and std are calculated and afterwards
aggregated using the mean value over all station-wise metrics. This method is not exactly accurate, especially
regarding the std calculation but therefore much faster. Furthermore, it is a weighted mean weighted by the
time series length / number of data itself - a longer time series has more influence on the transformation
settings than a short time series. The estimation of the std in less accurate, because the unweighted mean of
all stds in not equal to the true std, but still the mean of all station-wise std is a decent estimate. Finally,
the real accuracy of mean and std is less important, because it is "just" a transformation / scaling.
### mean and std given
If mean and std are not None, the default data handler expects this parameters to match the data and applies
this values to the data. Make sure that all dimensions and/or coordinates are in agreement.
"""
sp_keys
=
{
k
:
copy
.
deepcopy
(
kwargs
[
k
])
for
k
in
cls
.
_requirements
if
k
in
kwargs
}
transformation_class
=
sp_keys
.
get
(
"transformation"
,
None
)
if
transformation_class
is
None
:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment