implement lazy data preprocessing
lazy data loading
lazy data loading on first time if possible
-
store the data locally in data path under different folder e.g.
-
create a checksum for the name and reuse this data always if checksum fits (this will replace all previous steps and save a lot of computation time)
-
if this works, this can be used for all subsets because data was preloaded as "preprocessing". For all subsets is would be sufficient to use lazy loading
-
checkout how to create a checksum -
store attributes _data
,meta
,input_data
,target_data
(data already loaded, interpolated, kzf applied, only create history, labels, ... has to be performed), additional attributes are stored for theDataHandlerKzFilterSingleStation
(self.cutoff_period
,self.cutoff_period_days
) -
add parameter lazy_preprocessing
which is default False to trigger lazy preprocessing -
compare checksum and try to load data -
continue with missing steps -
there must be a check regarding variables, and start/end point. Data must be reloaded if start date is earlier than available in data (maybe there could be a check for the case, that there is not data for the starting point, which would trigger an unintended repreprocessing of data) -> NO check for start and end. We assume that data are first used with total time range.
Check this links out:
These links are related how a class can be stored
https://stackoverflow.com/questions/23582489/python-pickle-protocol-choice
https://stackoverflow.com/questions/4529815/saving-an-object-data-persistence
It seems that a checksum is not possible to create for classes. Maybe there is a way to create a string that summarises all essential properties of a class and create a checksum from this?