openstef.validation package¶
Submodules¶
openstef.validation.validation module¶
- openstef.validation.validation.calc_completeness_dataframe(df, time_delayed=False, homogenise=True)¶
Calculate the completeness of each column in dataframe.
NOTE: NA values count as incomplete
- Parameters:
df (
DataFrame
) – Dataframe with a datetimeIndex indextime_delayed (
bool
) – Should there be a correction for T-x columnshomogenise (
bool
) – Should the index be resampled to median time delta - only available for DatetimeIndex
- Return type:
DataFrame
- Returns:
Dataframe with fraction of completeness per column
- openstef.validation.validation.calc_completeness_features(df, weights, time_delayed=False, homogenise=True)¶
Calculate the (weighted) completeness of a dataframe.
NOTE: NA values count as incomplete
- Parameters:
df (
DataFrame
) – Dataframe with a datetimeIndex indexweights (
DataFrame
) – Array-compatible with size equal to columns of df (excl. load&horizon), used to weight the completeness of each columntime_delayed (
bool
) – Should there be a correction for T-x columnshomogenise (
bool
) – Should the index be resampled to median time delta - only available for DatetimeIndex
- Return type:
float
- Returns:
Fraction of completeness
- openstef.validation.validation.detect_ongoing_zero_flatliner(load, duration_threshold_minutes)¶
Detects if the latest measurements follow a zero flatliner pattern.
- Parameters:
load (pd.Series) – A timeseries of measured load with a datetime index.
duration_threshold_minutes (int) – A zero flatliner is only detected if it exceeds the threshold duration.
- Returns:
Indicating whether or not there is a zero flatliner ongoing for the given load.
- Return type:
bool
- openstef.validation.validation.drop_target_na(data)¶
- Return type:
DataFrame
- openstef.validation.validation.is_data_sufficient(data, completeness_threshold, minimal_table_length, model=None)¶
Check if enough data is left after validation and cleaning to continue with model training.
- Parameters:
data (
DataFrame
) – pd.DataFrame() with cleaned input data.model (
OpenstfRegressor
) – model which contains all information regarding trained modelcompleteness_threshold (
float
) – float with threshold for completeness: 1 for fully complete, 0 for anything could be missing.minimal_table_length (
int
) – int with minimal table length (in rows)
- Return type:
bool
- Returns:
True if amount of data is sufficient, False otherwise.
- openstef.validation.validation.validate(pj_id, data, flatliner_threshold_minutes, resolution_minutes)¶
Validate prediction job and timeseries data.
Steps: 1. Check if input dataframe has a datetime index. 1. Check if a zero flatliner pattern is ongoing (i.e. all recent measurements are zero). 2. Replace repeated values for longer than flatliner_threshold_minutes with NaN.
- Parameters:
pj_id (
Union
[int
,str
]) – ind/str, used to identify log statementsdata (
DataFrame
) – pd.DataFrame where the first column should be the target. index=datetimeIndexflatliner_threshold_minutes (
Optional
[int
]) – int indicating the number of minutes after which constant load is considered a flatline. if None, the validation is effectively skippedresolution_minutes (
int
) – The forecasting resolution in minutes.
- Return type:
DataFrame
- Returns:
Dataframe where repeated values are set to None
- Raises:
InputDataOngoingZeroFlatlinerError – If all recent load measurements are zero.