openstef.validation package¶

Submodules¶

openstef.validation.validation module¶

openstef.validation.validation.calc_completeness_dataframe(df, time_delayed=False, homogenise=True)¶

Calculate the completeness of each column in dataframe.

NOTE: NA values count as incomplete

Parameters:

df (DataFrame) – Dataframe with a datetimeIndex index
time_delayed (bool) – Should there be a correction for T-x columns
homogenise (bool) – Should the index be resampled to median time delta - only available for DatetimeIndex

Return type:

DataFrame

Returns:

Dataframe with fraction of completeness per column

openstef.validation.validation.calc_completeness_features(df, weights, time_delayed=False, homogenise=True)¶

Calculate the (weighted) completeness of a dataframe.

NOTE: NA values count as incomplete

Parameters:

df (DataFrame) – Dataframe with a datetimeIndex index
weights (DataFrame) – Array-compatible with size equal to columns of df (excl. load&horizon), used to weight the completeness of each column
time_delayed (bool) – Should there be a correction for T-x columns
homogenise (bool) – Should the index be resampled to median time delta - only available for DatetimeIndex

Return type:

float

Returns:

Fraction of completeness

openstef.validation.validation.detect_ongoing_flatliner(load, duration_threshold_minutes, *, detect_non_zero_flatliner=False)¶

Detects if the latest measurements follow a flatliner pattern.

Parameters:

load (pd.Series) – A timeseries of measured load with a datetime index.
duration_threshold_minutes (int) – A flatliner is only detected if it exceeds the threshold duration.
detect_non_zero_flatliner (bool) – If True, a flatliner is detected for non-zero values. If False, a flatliner is detected for zero values only.

Returns:

Indicating whether or not there is a flatliner ongoing for the given load.

Return type:

bool

openstef.validation.validation.drop_target_na(data)¶

Return type:: DataFrame

openstef.validation.validation.is_data_sufficient(data, completeness_threshold, minimal_table_length, model=None)¶

Check if enough data is left after validation and cleaning to continue with model training.

Parameters:

data (DataFrame) – pd.DataFrame() with cleaned input data.
model (OpenstfRegressor) – model which contains all information regarding trained model
completeness_threshold (float) – float with threshold for completeness: 1 for fully complete, 0 for anything could be missing.
minimal_table_length (int) – int with minimal table length (in rows)

Return type:

bool

Returns:

True if amount of data is sufficient, False otherwise.

openstef.validation.validation.validate(pj_id, data, flatliner_threshold_minutes, resolution_minutes, *, detect_non_zero_flatliner=False)¶

Validate prediction job and timeseries data.

Steps: 1. Check if input dataframe has a datetime index. 1. Check if a flatliner pattern is ongoing (i.e. all recent measurements are constant,

0 in case detect_non_zero_flatliner = True).

Replace repeated values for longer than flatliner_threshold_minutes with NaN.

Parameters:

pj_id (Union[int, str]) – ind/str, used to identify log statements
data (DataFrame) – pd.DataFrame where the first column should be the target. index=datetimeIndex
flatliner_threshold_minutes (Optional[int]) – int indicating the number of minutes after which constant load is considered a flatline. if None, the validation is effectively skipped
resolution_minutes (int) – The forecasting resolution in minutes.
detect_non_zero_flatliner (bool) – If True, a flatliner is detected for non-zero values. If False, a flatliner is detected for zero values only.

Return type:

DataFrame

Returns:

Dataframe where repeated values are set to None

Raises:

InputDataOngoingFlatlinerError – If all recent load measurements are constant.

openstef.validation package¶

Submodules¶

openstef.validation.validation module¶

Module contents¶