openstef.validation package

Submodules

openstef.validation.validation module

openstef.validation.validation.calc_completeness_dataframe(df, time_delayed=False, homogenise=True)

Calculate the completeness of each column in dataframe.

NOTE: NA values count as incomplete

Parameters:
  • df (DataFrame) – Dataframe with a datetimeIndex index

  • time_delayed (bool) – Should there be a correction for T-x columns

  • homogenise (bool) – Should the index be resampled to median time delta - only available for DatetimeIndex

Return type:

DataFrame

Returns:

Dataframe with fraction of completeness per column

openstef.validation.validation.calc_completeness_features(df, weights, time_delayed=False, homogenise=True)

Calculate the (weighted) completeness of a dataframe.

NOTE: NA values count as incomplete

Parameters:
  • df (DataFrame) – Dataframe with a datetimeIndex index

  • weights (DataFrame) – Array-compatible with size equal to columns of df (excl. load&horizon), used to weight the completeness of each column

  • time_delayed (bool) – Should there be a correction for T-x columns

  • homogenise (bool) – Should the index be resampled to median time delta - only available for DatetimeIndex

Return type:

float

Returns:

Fraction of completeness

openstef.validation.validation.detect_ongoing_zero_flatliner(load, duration_threshold_minutes)

Detects if the latest measurements follow a zero flatliner pattern.

Parameters:
  • load (pd.Series) – A timeseries of measured load with a datetime index.

  • duration_threshold_minutes (int) – A zero flatliner is only detected if it exceeds the threshold duration.

Returns:

Indicating whether or not there is a zero flatliner ongoing for the given load.

Return type:

bool

openstef.validation.validation.drop_target_na(data)
Return type:

DataFrame

openstef.validation.validation.is_data_sufficient(data, completeness_threshold, minimal_table_length, model=None)

Check if enough data is left after validation and cleaning to continue with model training.

Parameters:
  • data (DataFrame) – pd.DataFrame() with cleaned input data.

  • model (Optional[OpenstfRegressor]) – model which contains all information regarding trained model

  • completeness_threshold (float) – float with threshold for completeness: 1 for fully complete, 0 for anything could be missing.

  • minimal_table_length (int) – int with minimal table length (in rows)

Return type:

bool

Returns:

True if amount of data is sufficient, False otherwise.

openstef.validation.validation.validate(pj_id, data, flatliner_threshold_minutes, resolution_minutes)

Validate prediction job and timeseries data.

Steps: 1. Check if input dataframe has a datetime index. 1. Check if a zero flatliner pattern is ongoing (i.e. all recent measurements are zero). 2. Replace repeated values for longer than flatliner_threshold_minutes with NaN.

Parameters:
  • pj_id (Union[int, str]) – ind/str, used to identify log statements

  • data (DataFrame) – pd.DataFrame where the first column should be the target. index=datetimeIndex

  • flatliner_threshold_minutes (Optional[int]) – int indicating the number of minutes after which constant load is considered a flatline. if None, the validation is effectively skipped

  • resolution_minutes (int) – The forecasting resolution in minutes.

Return type:

DataFrame

Returns:

Dataframe where repeated values are set to None

Raises:

InputDataOngoingZeroFlatlinerError – If all recent load measurements are zero.

Module contents