openstef.feature_engineering package#

Submodules#

openstef.feature_engineering.apply_features module#

This module provides functionality for applying features to the input data to improve forecast accuracy.

Examples of features that are added:
  • The load 1 day and 7 days ago at the same time.

  • If a day is a weekday or a holiday.

  • The extrapolated windspeed at 100m.

  • The normalised wind power according to the turbine-specific power curve.

openstef.feature_engineering.apply_features.apply_features(data, pj=None, feature_names=None, horizon=24.0)#

Applies the feature functions defined in feature_functions.py and returns the complete dataframe.

Features requiring more recent label-data are omitted.

Note

For the time deriven features only the onces in the features list will be added. But for the weather features all will be added at present. These unrequested additional features have to be filtered out later.

Parameters:
  • data (pandas.DataFrame) –

    a pandas dataframe with input data in the form: pd.DataFrame(

    index=datetime, columns=[label, predictor_1,…, predictor_n]

    )

  • pj (PredictionJobDataClass) – Prediction job.

  • feature_names (list[str]) – list of reuqested features

  • horizon (float) – Forecast horizon limit in hours.

Return type:

DataFrame

Returns:

pd.DataFrame(index = datetime, columns = [label, predictor_1,…, predictor_n, feature_1, …, feature_m])

Example output:

import pandas as pd
import numpy as np
index = pd.date_range(start = "2017-01-01 09:00:00",
freq = '15T', periods = 200)
data = pd.DataFrame(index = index,
                    data = dict(load=
                    np.sin(index.hour/24*np.pi)*
                    np.random.uniform(0.7,1.7, 200)))

openstef.feature_engineering.data_preparation module#

class openstef.feature_engineering.data_preparation.ARDataPreparation(pj, model_specs, model=None, horizons=None, historical_depth=None)#

Bases: AbstractDataPreparation

prepare_forecast_data(data)#
Return type:

tuple[DataFrame, DataFrame]

prepare_train_data(data)#
Return type:

DataFrame

class openstef.feature_engineering.data_preparation.AbstractDataPreparation(pj, model_specs, model=None, horizons=None)#

Bases: ABC

check_model()#
abstract prepare_forecast_data(data)#
Return type:

tuple[DataFrame, DataFrame]

abstract prepare_train_data(data)#
Return type:

DataFrame

class openstef.feature_engineering.data_preparation.LegacyDataPreparation(pj, model_specs, model=None, horizons=None)#

Bases: AbstractDataPreparation

prepare_forecast_data(data)#
Return type:

DataFrame

prepare_train_data(data)#
Return type:

DataFrame

openstef.feature_engineering.feature_adder module#

This module provides functionality for defining custom feature adders.

class openstef.feature_engineering.feature_adder.FeatureAdder#

Bases: ABC

Abstract class that implement the FeatureAdder interface.

It is the basic block that handles the logic for computing the specific feature and the syntactic sugar to load properly the feature adder according to the feature name.

abstract apply_features(df, parsed_feature_names)#

Apply or add the features to the input dataframe.

Return type:

DataFrame

abstract property name: str#

Name of the FeatureAdder.

Return type:

str

parse_feature_name(feature_name)#

Parse a feature name.

If the feature name is taken in charge by the feature adder, the method returns a dictionnary with the potentially parsed parameters contained the feature name. In the case the feature name does not contain parameters an empty dictionary is returned. Otherwise the method returns None.

Parameters:

feature_name (str) – The feature name, this may contain parameter informations.

Returns:

The parsed parameters. If the feature name is recognized but has no parameters an empty dictionnary is returned. If the feature name is not recognized, None is returned.

Return type:

Optional[dict[str, Any]]

abstract required_features(feature_names)#

List of features that are required to calculate this feature.

Return type:

list[str]

class openstef.feature_engineering.feature_adder.FeatureDispatcher(feature_adders)#

Bases: object

Orchestrator of the feature adders.

It scans the feature_names to assign to each feature the proper feature adder and launch the effective computing of the features.

apply_features(df, feature_names)#

Applies features to the input DataFrame.

Parameters:
  • df (DataFrame) – DataFrame to which the features have to be added.

  • feature_names (list[str]) – Names of the features.

Return type:

DataFrame

Returns:

DataFrame with the added features.

dispatch_features(feature_names)#

Dispatch features.

Parameters:

feature_names (list[str]) – The names of the features to be dispatched.

Return type:

dict[FeatureAdder, list[ParsedFeature]]

Returns:

Dictionary with parsed features.

class openstef.feature_engineering.feature_adder.ParsedFeature(name, params)#

Bases: tuple

name#

Alias for field number 0

params#

Alias for field number 1

openstef.feature_engineering.feature_adder.adders_from_module(module_name)#

Load all FeatureAdders classes on the fly from the module.

Parameters:

module_name (str) – The name of the module from which to import.

Return type:

list[FeatureAdder]

Returns:

A list with all loaded FeatureAdders.

openstef.feature_engineering.feature_adder.adders_from_modules(module_names)#

Load all FeatureAdders classes on the fly from multiple modules.

Parameters:

module_names (list[str]) – A list with names of the modules from which to import.

Return type:

list[FeatureAdder]

Returns:

A list with all loaded FeatureAdders.

openstef.feature_engineering.feature_applicator module#

This module defines several FeatureApplicators.

These applicatiors are used to add features to the input data in the corresponding pipelines.

class openstef.feature_engineering.feature_applicator.AbstractFeatureApplicator(horizons, feature_names=None, feature_modules=[])#

Bases: ABC

Defines the Applicator interface.

abstract add_features(df, pj=None)#

Adds features to an input DataFrame.

Parameters:
  • df (DataFrame) – DataFrame with input data to which the features have to be added

  • pj (Optional[PredictionJobDataClass]) – (Optional) A prediction job that is needed for location dependent features, if not specified a default location is used

Return type:

DataFrame

Returns:

Dataframe with added features.

class openstef.feature_engineering.feature_applicator.OperationalPredictFeatureApplicator(horizons, feature_names=None, feature_modules=[])#

Bases: AbstractFeatureApplicator

Feature applicator for use in operational forecasts.

add_features(df, pj=None)#

Adds features to an input DataFrame.

This method is implemented specifically for an operational prediction pipeline and will add every available feature.

Parameters:
  • df (DataFrame) – DataFrame with input data to which the features have to be added

  • pj (Optional[PredictionJobDataClass]) – (Optional) A prediction job that is needed for location dependent features, if not specified a default location is used

Return type:

DataFrame

Returns:

Input DataFrame with an extra column for every added feature.

class openstef.feature_engineering.feature_applicator.TrainFeatureApplicator(horizons, feature_names=None, feature_modules=[])#

Bases: AbstractFeatureApplicator

Feature applicator for use during training.

add_features(df, pj=None, latency_config=None)#

Adds features to an input DataFrame.

This method is implemented specifically for a model train pipeline. For larger horzions data is invalidated as when they are not available.

For example:

For horzion 24 hours the feature T-720min is not added as the load 720 minutes ago is not available 24 hours in advance. In case of a horizon 0.25 hours this feature is added as in this case the feature is available.

Parameters:
  • df (DataFrame) – Input data to which the features will be added.

  • pj (Optional[PredictionJobDataClass]) – (Optional) A prediction job that is needed for location dependent features, if not specified a default location is used

  • latency_config (Optional[dict]) – (Optional) Invalidate certain features that are not available for a specific horizon due to data latency. Defaults to {"APX": 24}.

Return type:

DataFrame

Returns:

Input DataFrame with an extra column for every added feature and sorted on the datetime index.

openstef.feature_engineering.general module#

This modelu contains various helper functions.

openstef.feature_engineering.general.add_missing_feature_columns(input_data, features)#

Adds feature column for features in the featurelist.

Add feature columns for features in the feature list if these columns don’t exist in the input data. If a column is added, its value is set to NaN. This is especially usefull to make sure the required columns are in place when making a prediction.

Note

This function is intended as a final check to prevent errors during predicion. In an ideal world this function is not nescarry.

Parameters:
  • input_data (DataFrame) – DataFrame with input data and featurs.

  • features (list[str]) – List of requiered features.

Return type:

DataFrame

Returns:

Input dataframe with missing columns filled with np.N=nan.

openstef.feature_engineering.general.enforce_feature_order(input_data)#

Enforces correct order of features.

Alphabetically orders the feature columns. The load column remains the first column and the horizons column remains the last column. Everything in between is alphabetically sorted: The order eventually looks like this: [“load”] – [alphabetically sorted features] – [‘horizon’]

This function assumes the first column contains the to be predicted variable Furthermore the “horizon” is moved to the last position if it is pressent.

Parameters:

input_data (DataFrame) – Input data with features.

Return type:

DataFrame

Returns:

Properly sorted input data.

openstef.feature_engineering.general.remove_non_requested_feature_columns(input_data, requested_features)#

Removes features that are provided in the input data but not in the feature list.

This should not be nescesarry but serves as an extra failsave for making predicitons

Parameters:
  • input_data (DataFrame) – DataFrame with features

  • requested_features (list[str]) – List of reuqested features

Return type:

DataFrame

Returns:

Model input data with features.

openstef.feature_engineering.holiday_features module#

This module contains all holiday related features.

openstef.feature_engineering.holiday_features.check_for_bridge_day(date, holiday_name, country, years, holiday_functions, bridge_days)#

Checks for bridgedays associated to a specific holiday with date (date).

Any found bridgedays are appende dto the bridgedays list. Also a specific feature function for the bridgeday is added to the general holidayfuncitons dictionary.

Parameters:
  • date (datetime) – Date of holiday to check for associated bridgedays.

  • holiday_name (str) – Name of the holiday.

  • country (str) – Country for which to detect the bridgedays.

  • years (list) – List of years for which to detect bridgedays.

  • holiday_functions (dict) – Dictionary to which the featurefunction has to be appended to in case of a bridgeday.

  • bridge_days (list) – List of bridgedays to which any found bridgedays have to be appended.

Return type:

tuple[dict, list]

Returns:

  • Dict with holiday feature functions

  • List of bridgedays

openstef.feature_engineering.holiday_features.generate_holiday_feature_functions(country='NL', years=None, path_to_school_holidays_csv=PosixPath('/home/runner/work/openstef/openstef/openstef/data/dutch_holidays_2020-2022.csv'))#

Generates functions for creating holiday feature.

This improves forecast accuracy. Examples of features that are added are: 2020-01-01 is ‘Nieuwjaarsdag’.

2022-12-24 - 2023-01-08 is the ‘Kerstvakantie’ 2022-10-15 - 2022-10-23 is the ‘HerfstvakantieNoord’

The holidays are based on a manually generated csv file. The information is collected using: https://www.schoolvakanties-nederland.nl/ and the python holiday function The official following official ducth holidays are included untill 2023:

  • Kerstvakantie

  • Meivakantie

  • Herstvakantie

  • Bouwvak

  • Zomervakantie

  • Voorjaarsvakantie

  • Nieuwjaarsdag

  • Pasen

  • Koningsdag

  • Hemelvaart

  • Pinksteren

  • Kerst

The ‘Brugdagen’ are updated untill dec 2020. (Generated using agenda)

Parameters:
  • country (str) – Country for which to create holiday features.

  • years (Optional[list]) – years for which to create holiday features.

  • path_to_school_holidays_csv (str) – Filepath to csv with school holidays.

Return type:

dict

Returns:

Dictionary with functions that check if a given date is a holiday, keys consist of “Is” + the_name_of_the_holiday_to_be_checked

openstef.feature_engineering.lag_features module#

This module contains all lag features.

openstef.feature_engineering.lag_features.extract_lag_features(feature_names, horizon=24.0)#

Creates a list of lag minutes and a list of lag days that were used during the training of the input model.

Parameters:
  • feature_names (list[str]) – All requested lag features

  • horizon (float) – Forecast horizon limit in hours.

Return type:

tuple[list, list]

Returns:

  • List of minute lags that were used as features during training.

  • List of days lags that were used as features during training.

openstef.feature_engineering.lag_features.generate_lag_feature_functions(feature_names=None, horizon=24.0)#

Creates functions to generate lag features in a dataset.

Parameters:
  • feature_names (Optional[list[str]]) – minute lagtimes that where used during training of the model. If empty a new set will be automatically generated.

  • horizon (float) – Forecast horizon limit in hours.

Return type:

dict

Returns:

Lag functions.

Example:

lag_functions = generate_lag_functions(data,minute_list,h_ahead)
openstef.feature_engineering.lag_features.generate_non_trivial_lag_times(data, height_threshold=0.1)#

Calculate an autocorrelation curve of the load trace.

This curve is subsequently used to add additional lag times as features.

Parameters:
  • data (DataFrame) – Dataframe with input data in the form pd.DataFrame(index = datetime, columns = [label, predictor_1,…, predictor_n])

  • height_threshold (float) – Minimal autocorrelation value to be recognized as a peak.

Return type:

list[int]

Returns:

Aditional non-trivial minute lags

openstef.feature_engineering.lag_features.generate_trivial_lag_features(horizon)#

Generates relevant lag times for lag feature function creation.

This function is mostly used during training of models and not during predicting.

Parameters:

horizon (float) – Forecast horizon limit in hours.

Return type:

tuple[list, list]

Returns:

  • List of minute lags that were used as features during training.

  • List of days lags that were used as features during training.

openstef.feature_engineering.missing_values_transformer module#

class openstef.feature_engineering.missing_values_transformer.MissingValuesTransformer(missing_values=nan, imputation_strategy=None, fill_value=None)#

Bases: object

MissingValuesTransformer handles missing values in data by imputing them with a given strategy.

It also removes columns that are always null from the data.

fit(x, y=None)#

Fit the imputer on the input data.

fit_transform(x, y=None)#

Fit the imputer on the input data and transform it.

Returns:

The data with missing values imputed.

in_feature_names: List[str] | None = None#
non_null_feature_names: List[str] = None#
transform(x)#

Transform the input data by imputing missing values.

Return type:

DataFrame

openstef.feature_engineering.weather_features module#

This module contains all wheather related functions used for feature engineering.

openstef.feature_engineering.weather_features.add_additional_solar_features(data, pj=None, feature_names=None)#

Adds additional solar features to the input data.

Parameters:
  • data (DataFrame) – Dataframe to which the solar features have to be added

  • pj (Optional[PredictionJobDataClass]) – prediction job which should at least contain the latitude and longitude location.

  • feature_names (Optional[list[str]]) – List of requested features

Return type:

DataFrame

Returns:

DataFrame same as input dataframe with extra columns for the added solar features

openstef.feature_engineering.weather_features.add_additional_wind_features(data, feature_names=None)#

Adds additional wind features to the input data.

Parameters:
  • data (DataFrame) – Dataframe to which the wind features have to be added

  • feature_names (Optional[list[str]]) – List of requested features

Return type:

DataFrame

Returns:

DataFrame same as input dataframe with extra columns for the added wind features

openstef.feature_engineering.weather_features.add_humidity_features(data, feature_names=None)#

Adds humidity features to the input dataframe.

These features are calculated using functions defines in this module. A list of requested features is used to determine whether to add the humidity features or not.

Parameters:
  • data (DataFrame) – Input dataframe to which features have to be added

  • feature_names (Optional[list[str]]) – list of requested features.

Return type:

DataFrame

Returns:

Same as input dataframe with extra columns for the humidty features.

openstef.feature_engineering.weather_features.calc_air_density(temperature, pressure, rh)#

Calculates the dewpoint.

Parameters:
  • temperature (Union[float, ndarray]) – The temperature in C

  • pressure (Union[float, ndarray]) – the atmospheric pressure in Pa

  • rh (Union[float, ndarray]) – Relative humidity

Return type:

Union[float, ndarray]

Returns:

The air density (kg/m^3)

openstef.feature_engineering.weather_features.calc_dewpoint(vapour_pressure)#

Calculates the dewpoint, see https://en.wikipedia.org/wiki/Dew_point for mroe info.

Parameters:

vapour_pressure (Union[float, ndarray]) – The vapour pressure for which the dewpoint should be calculated

Return type:

Union[float, ndarray]

Returns:

Dewpoint

openstef.feature_engineering.weather_features.calc_saturation_pressure(temperature)#

Calculate the water vapour pressure from the temperature.

See https://www.vaisala.com/sites/default/files/documents/Humidity_Conversion_Formulas_B210973EN-F.pdf.

Parameters:

temperature (Union[float, ndarray]) – Temperature in C

Return type:

Union[float, ndarray]

Returns:

The saturation pressure of water at the respective temperature

openstef.feature_engineering.weather_features.calc_vapour_pressure(rh, psat)#

Calculates the vapour pressure.

Parameters:
  • rh (Union[float, ndarray]) – Relative humidity

  • psat (Union[float, ndarray]) – Saturation pressure: see calc_saturation_pressure

Return type:

Union[float, ndarray]

Returns:

The water vapour pressure

openstef.feature_engineering.weather_features.calculate_dni(radiation, pj)#

Calculate the direct normal irradiance (DNI).

This function uses the predicted radiation and information derived from the location (obtained from pj)

Parameters:
  • radiation (Series) – predicted radiation including DatetimeIndex with right time-zone

  • pj (PredictionJobDataClass) – PredictJob including information about the location (lat, lon)

Return type:

Series

Returns:

Direct normal irradiance (DNI).

openstef.feature_engineering.weather_features.calculate_gti(radiation, pj, surface_tilt=34.0, surface_azimuth=180)#

Calculate the GTI/POA using the radiation.

This function assumes Global Tilted Irradiance (GTI) = Plane of Array (POA)

Parameters:
  • radiation (Series) – pandas series with DatetimeIndex with right timezone information

  • pj (PredictionJobDataClass) – prediction job which should at least contain the latitude and longitude location.

  • surface_tilt (float) – The tilt of the surface of, for example, your PhotoVoltaic-system.

  • surface_azimuth (float) – The way the surface is facing. South facing 180 degrees, North facing 0 degrees, East facing 90 degrees and West facing 270 degrees

Return type:

Series

Returns:

Global Tilted Irradiance (GTI)

openstef.feature_engineering.weather_features.calculate_windspeed_at_hubheight(windspeed, fromheight=10.0, hub_height=100.0)#

Calculate windspeed at hubheight.

Calculates the windspeed at hubheigh by extrapolation from a given height to a given hub height using the wind power law https://en.wikipedia.org/wiki/Wind_profile_power_law

Parameters:
  • windspeed (Union[float, Series]) – float OR pandas series of windspeed at height = height

  • fromheight (float) – height (m) of the windspeed data. Default is 10m

  • hubheight – height (m) of the turbine

Return type:

Series

Returns:

Windspeed at hubheight.

openstef.feature_engineering.weather_features.calculate_windturbine_power_output(windspeed, n_turbines=1, turbine_data=None)#

Calculate wind turbine power output.

These values are related through the power curve, which is described by turbine_data. If no turbine_data is given, default values are used and results are normalized to 1MWp. If n_turbines=0, the result is normalized to a rated power of 1.

Parameters:
  • windspeed (Series) – pd.DataFrame(index = datetime, columns = [“windspeedHub”])

  • nTurbines – The number of turbines

  • turbineData – slope_center, rated_power, steepness

Return type:

Series

Returns:

pd.DataFrame(index = datetime, columns = [“forecast”])

openstef.feature_engineering.weather_features.humidity_calculations(temperature, rh, pressure)#

Function that calculates weather features based on humidity..

These features are:
  • Saturation pressure

  • Vapour pressure

  • Dewpoint

  • Air density

Parameters:
  • temperature (Union[float, ndarray]) – Temperature in C

  • rh (Union[float, ndarray]) – Relative humidity in %

  • pressure (Union[float, ndarray]) – The air pressure in hPa

Return type:

Union[dict, ndarray]

Returns:

If the input is an np.ndarray; a pandas dataframe with the calculated moisture indices, if the input is numeric; a dict with the calculated moisture indices

Module contents#