openstef.validation package#
Submodules#
openstef.validation.validation module#
- openstef.validation.validation.calc_completeness_dataframe(df, time_delayed=False, homogenise=True)#
Calculate the completeness of each column in dataframe.
NOTE: NA values count as incomplete
- Parameters:
df (
DataFrame
) – Dataframe with a datetimeIndex indextime_delayed (
bool
) – Should there be a correction for T-x columnshomogenise (
bool
) – Should the index be resampled to median time delta - only available for DatetimeIndex
- Return type:
DataFrame
- Returns:
Dataframe with fraction of completeness per column
- openstef.validation.validation.calc_completeness_features(df, weights, time_delayed=False, homogenise=True)#
Calculate the (weighted) completeness of a dataframe.
NOTE: NA values count as incomplete
- Parameters:
df (
DataFrame
) – Dataframe with a datetimeIndex indexweights (
DataFrame
) – Array-compatible with size equal to columns of df (excl. load&horizon), used to weight the completeness of each columntime_delayed (
bool
) – Should there be a correction for T-x columnshomogenise (
bool
) – Should the index be resampled to median time delta - only available for DatetimeIndex
- Return type:
float
- Returns:
Fraction of completeness
- openstef.validation.validation.check_data_for_each_trafo(df, col)#
Function that detects if each column contains zero-values at all, only zero-values and NaN values.
- Parameters:
df (
DataFrame
) – DataFrama such as pd.dataFrame(index=DatetimeIndex, columns = [load1, …, loadN]). Load_corrections should be indicated by ‘LC_’col (
Series
) – column of pd.dataFrame
- Return type:
bool
- Returns:
False if column contains above specified or True if not
- openstef.validation.validation.drop_target_na(data)#
- Return type:
DataFrame
- openstef.validation.validation.find_nonzero_flatliner(df, threshold=None)#
Function that detects a stationflatliner and returns a list of datetimes.
- Parameters:
df (
DataFrame
) – Example pd.dataFrame(index=DatetimeIndex, columns = [load1, …, loadN]). Load_corrections should be indicated by ‘LC_’threshold (
Optional
[int
]) – after how many timesteps should the function detect a flatliner. If None, the check is not executed
- Return type:
Optional
[DataFrame
]- Returns:
Flatline moments or None
TODO: a lot of the logic of this function can be improved using: mnts.label
` import scipy.ndimage.measurements as mnts mnts.label `
- openstef.validation.validation.find_zero_flatliner(df, threshold, flatliner_window, flatliner_load_threshold)#
Detect a zero value where the load is not compensated by the other trafo’s of the station.
If zero value is at start or end, ignore that block.
- Parameters:
df (
DataFrame
) – DataFrame such as pd.dataFrame(index=DatetimeIndex, columns = [load1, …, loadN]). Load_corrections should be indicated by ‘LC_’threshold (
float
) – after how many hours should the function detect a flatliner.flatliner_window (
timedelta
) – for how many hours before the zero-value should the mean load be calculated.flatliner_load_threshold (
float
) – how big may the difference be between the total station load before and during the zero-value(s).
- Return type:
DataFrame
- Returns:
DataFrame of timestamps, or None if none
TODO: a lot of the logic of this function can be improved using: mnts.label
` import scipy.ndimage.measurements as mnts mnts.label `
- openstef.validation.validation.is_data_sufficient(data, completeness_threshold, minimal_table_length, model=None)#
Check if enough data is left after validation and cleaning to continue with model training.
- Parameters:
data (
DataFrame
) – pd.DataFrame() with cleaned input data.model (
Optional
[OpenstfRegressor
]) – model which contains all information regarding trained modelcompleteness_threshold (
float
) – float with threshold for completeness: 1 for fully complete, 0 for anything could be missing.minimal_table_length (
int
) – int with minimal table length (in rows)
- Return type:
bool
- Returns:
True if amount of data is sufficient, False otherwise.
- openstef.validation.validation.validate(pj_id, data, flatliner_threshold)#
Validate prediction job and timeseries data.
Steps: 1. Replace repeated values for longer than flatliner_threshold with NaN # TODO: The function description suggests it ‘validates’ the PJ and Data, but is appears to ‘just’ replace repeated observations with NaN.
- Parameters:
pj_id (
Union
[int
,str
]) – ind/str, used to identify log statementsdata (
DataFrame
) – pd.DataFrame where the first column should be the target. index=datetimeIndexflatliner_threshold (
Optional
[int
]) – int of max repetitions considered a flatline. if None, the validation is effectively skipped
- Return type:
DataFrame
- Returns:
Dataframe where repeated values are set to None