openstef.validation package#

Submodules#

openstef.validation.validation module#

openstef.validation.validation.calc_completeness_dataframe(df, time_delayed=False, homogenise=True)#

Calculate the completeness of each column in dataframe.

NOTE: NA values count as incomplete

Parameters:
  • df (DataFrame) – Dataframe with a datetimeIndex index

  • time_delayed (bool) – Should there be a correction for T-x columns

  • homogenise (bool) – Should the index be resampled to median time delta - only available for DatetimeIndex

Return type:

DataFrame

Returns:

Dataframe with fraction of completeness per column

openstef.validation.validation.calc_completeness_features(df, weights, time_delayed=False, homogenise=True)#

Calculate the (weighted) completeness of a dataframe.

NOTE: NA values count as incomplete

Parameters:
  • df (DataFrame) – Dataframe with a datetimeIndex index

  • weights (DataFrame) – Array-compatible with size equal to columns of df (excl. load&horizon), used to weight the completeness of each column

  • time_delayed (bool) – Should there be a correction for T-x columns

  • homogenise (bool) – Should the index be resampled to median time delta - only available for DatetimeIndex

Return type:

float

Returns:

Fraction of completeness

openstef.validation.validation.check_data_for_each_trafo(df, col)#

Function that detects if each column contains zero-values at all, only zero-values and NaN values.

Parameters:
  • df (DataFrame) – DataFrama such as pd.dataFrame(index=DatetimeIndex, columns = [load1, …, loadN]). Load_corrections should be indicated by ‘LC_

  • col (Series) – column of pd.dataFrame

Return type:

bool

Returns:

False if column contains above specified or True if not

openstef.validation.validation.drop_target_na(data)#
Return type:

DataFrame

openstef.validation.validation.find_nonzero_flatliner(df, threshold=None)#

Function that detects a stationflatliner and returns a list of datetimes.

Parameters:
  • df (DataFrame) – Example pd.dataFrame(index=DatetimeIndex, columns = [load1, …, loadN]). Load_corrections should be indicated by ‘LC_

  • threshold (Optional[int]) – after how many timesteps should the function detect a flatliner. If None, the check is not executed

Return type:

Optional[DataFrame]

Returns:

Flatline moments or None

TODO: a lot of the logic of this function can be improved using: mnts.label

` import scipy.ndimage.measurements as mnts mnts.label `

openstef.validation.validation.find_zero_flatliner(df, threshold, flatliner_window, flatliner_load_threshold)#

Detect a zero value where the load is not compensated by the other trafo’s of the station.

If zero value is at start or end, ignore that block.

Parameters:
  • df (DataFrame) – DataFrame such as pd.dataFrame(index=DatetimeIndex, columns = [load1, …, loadN]). Load_corrections should be indicated by ‘LC_

  • threshold (float) – after how many hours should the function detect a flatliner.

  • flatliner_window (timedelta) – for how many hours before the zero-value should the mean load be calculated.

  • flatliner_load_threshold (float) – how big may the difference be between the total station load before and during the zero-value(s).

Return type:

DataFrame

Returns:

DataFrame of timestamps, or None if none

TODO: a lot of the logic of this function can be improved using: mnts.label ` import scipy.ndimage.measurements as mnts mnts.label `

openstef.validation.validation.is_data_sufficient(data, completeness_threshold, minimal_table_length, model=None)#

Check if enough data is left after validation and cleaning to continue with model training.

Parameters:
  • data (DataFrame) – pd.DataFrame() with cleaned input data.

  • model (Optional[OpenstfRegressor]) – model which contains all information regarding trained model

  • completeness_threshold (float) – float with threshold for completeness: 1 for fully complete, 0 for anything could be missing.

  • minimal_table_length (int) – int with minimal table length (in rows)

Return type:

bool

Returns:

True if amount of data is sufficient, False otherwise.

openstef.validation.validation.validate(pj_id, data, flatliner_threshold)#

Validate prediction job and timeseries data.

Steps: 1. Replace repeated values for longer than flatliner_threshold with NaN # TODO: The function description suggests it ‘validates’ the PJ and Data, but is appears to ‘just’ replace repeated observations with NaN.

Parameters:
  • pj_id (Union[int, str]) – ind/str, used to identify log statements

  • data (DataFrame) – pd.DataFrame where the first column should be the target. index=datetimeIndex

  • flatliner_threshold (Optional[int]) – int of max repetitions considered a flatline. if None, the validation is effectively skipped

Return type:

DataFrame

Returns:

Dataframe where repeated values are set to None

Module contents#