openstef.feature_engineering package¶
Submodules¶
openstef.feature_engineering.apply_features module¶
This module provides functionality for applying features to the input data to improve forecast accuracy.
- Examples of features that are added:
The load 1 day and 7 days ago at the same time.
If a day is a weekday or a holiday.
The extrapolated windspeed at 100m.
The normalised wind power according to the turbine-specific power curve.
- openstef.feature_engineering.apply_features.apply_features(data, pj=None, feature_names=None, horizon=24.0)¶
Applies the feature functions defined in
feature_functions.py
and returns the complete dataframe.Features requiring more recent label-data are omitted.
Note
For the time deriven features only the onces in the features list will be added. But for the weather features all will be added at present. These unrequested additional features have to be filtered out later.
- Parameters:
data (pandas.DataFrame) –
a pandas dataframe with input data in the form: pd.DataFrame(
index=datetime, columns=[label, predictor_1,…, predictor_n]
)
pj (PredictionJobDataClass) – Prediction job.
feature_names (list[str]) – list of reuqested features
horizon (float) – Forecast horizon limit in hours.
- Return type:
DataFrame
- Returns:
pd.DataFrame(index = datetime, columns = [label, predictor_1,…, predictor_n, feature_1, …, feature_m])
Example output:
import pandas as pd import numpy as np from geopy.geocoders import Nominatim index = pd.date_range(start = "2017-01-01 09:00:00", freq = '15T', periods = 200) data = pd.DataFrame(index = index, data = dict(load= np.sin(index.hour/24*np.pi)* np.random.uniform(0.7,1.7, 200)))
openstef.feature_engineering.bidding_zone_to_country_mapping module¶
openstef.feature_engineering.cyclic_features module¶
- openstef.feature_engineering.cyclic_features.add_daylight_terrestrial_feature(data, path_to_terrestrial_radiation_csv=PosixPath('/home/runner/work/openstef/openstef/openstef/data/NL_terrestrial_radiation.csv'))¶
Add daylight terrestrial radiation feature to the input dataset. This function processes terrestrial radiation data and aligns it with the time indices of the input dataset. The terrestrial radiation data is normalized, interpolated, and merged with the main dataset to provide a feature representing terrestrial radiation.
- Parameters:
data (pd.DataFrame) – The input dataset containing a time-indexed DataFrame.
path_to_terrestrial_radiation_csv (str) – File path to the CSV file containing terrestrial radiation data. The CSV file should have a time-based index.
- Returns:
The input dataset with an added column for the terrestrial radiation feature.
- Return type:
pd.DataFrame
Notes
The function assumes the input data and the terrestrial radiation data share the same time zone and frequency alignment.
The terrestrial radiation values are normalized using z-score normalization.
- openstef.feature_engineering.cyclic_features.add_seasonal_cyclic_features(data, compute_features=None)¶
Adds cyclical features to capture seasonal and periodic patterns in time-based data.
Args: - data (pd.DataFrame): DataFrame with a DatetimeIndex. - compute_features (list): Optional. List of features to compute. Options are:
[‘season’, ‘dayofweek’, ‘month’]. Default is all features.
Returns: - pd.DataFrame: DataFrame with added cyclical features.
Example: >>> data = pd.DataFrame(index=pd.date_range(start=’2023-01-01’, periods=365, freq=’D’)) >>> data_with_features = add_cyclical_features(data) >>> print(data_with_features.head())
- Return type:
DataFrame
- openstef.feature_engineering.cyclic_features.add_time_cyclic_features(data)¶
Adds time of the day features cyclically encoded using sine and cosine to the input data.
- Parameters:
data (
DataFrame
) – Dataframe indexed by datetime.- Return type:
DataFrame
- Returns:
DataFrame that is the same as input dataframe with extra columns for the added time of the day features.
openstef.feature_engineering.data_preparation module¶
- class openstef.feature_engineering.data_preparation.ARDataPreparation(pj, model_specs, model=None, horizons=None, historical_depth=None)¶
Bases:
AbstractDataPreparation
- prepare_forecast_data(data)¶
- Return type:
tuple
[DataFrame
,DataFrame
]
- prepare_train_data(data)¶
- Return type:
DataFrame
- class openstef.feature_engineering.data_preparation.AbstractDataPreparation(pj, model_specs, model=None, horizons=None)¶
Bases:
ABC
- check_model()¶
- abstract prepare_forecast_data(data)¶
- Return type:
tuple
[DataFrame
,DataFrame
]
- abstract prepare_train_data(data)¶
- Return type:
DataFrame
- class openstef.feature_engineering.data_preparation.LegacyDataPreparation(pj, model_specs, model=None, horizons=None)¶
Bases:
AbstractDataPreparation
- prepare_forecast_data(data)¶
- Return type:
DataFrame
- prepare_train_data(data)¶
- Return type:
DataFrame
openstef.feature_engineering.feature_adder module¶
This module provides functionality for defining custom feature adders.
- class openstef.feature_engineering.feature_adder.FeatureAdder¶
Bases:
ABC
Abstract class that implement the FeatureAdder interface.
It is the basic block that handles the logic for computing the specific feature and the syntactic sugar to load properly the feature adder according to the feature name.
- abstract apply_features(df, parsed_feature_names)¶
Apply or add the features to the input dataframe.
- Return type:
DataFrame
- abstract property name: str¶
Name of the FeatureAdder.
- Return type:
str
- parse_feature_name(feature_name)¶
Parse a feature name.
If the feature name is taken in charge by the feature adder, the method returns a dictionnary with the potentially parsed parameters contained the feature name. In the case the feature name does not contain parameters an empty dictionary is returned. Otherwise the method returns None.
- Parameters:
feature_name (str) – The feature name, this may contain parameter informations.
- Returns:
The parsed parameters. If the feature name is recognized but has no parameters an empty dictionnary is returned. If the feature name is not recognized, None is returned.
- Return type:
Optional[dict[str, Any]]
- abstract required_features(feature_names)¶
List of features that are required to calculate this feature.
- Return type:
list
[str
]
- class openstef.feature_engineering.feature_adder.FeatureDispatcher(feature_adders)¶
Bases:
object
Orchestrator of the feature adders.
It scans the feature_names to assign to each feature the proper feature adder and launch the effective computing of the features.
- apply_features(df, feature_names)¶
Applies features to the input DataFrame.
- Parameters:
df (
DataFrame
) – DataFrame to which the features have to be added.feature_names (
list
[str
]) – Names of the features.
- Return type:
DataFrame
- Returns:
DataFrame with the added features.
- dispatch_features(feature_names)¶
Dispatch features.
- Parameters:
feature_names (
list
[str
]) – The names of the features to be dispatched.- Return type:
dict
[FeatureAdder
,list
[ParsedFeature
]]- Returns:
Dictionary with parsed features.
- class openstef.feature_engineering.feature_adder.ParsedFeature(name, params)¶
Bases:
tuple
- name¶
Alias for field number 0
- params¶
Alias for field number 1
- openstef.feature_engineering.feature_adder.adders_from_module(module_name)¶
Load all FeatureAdders classes on the fly from the module.
- Parameters:
module_name (
str
) – The name of the module from which to import.- Return type:
list
[FeatureAdder
]- Returns:
A list with all loaded FeatureAdders.
- openstef.feature_engineering.feature_adder.adders_from_modules(module_names)¶
Load all FeatureAdders classes on the fly from multiple modules.
- Parameters:
module_names (
list
[str
]) – A list with names of the modules from which to import.- Return type:
list
[FeatureAdder
]- Returns:
A list with all loaded FeatureAdders.
openstef.feature_engineering.feature_applicator module¶
This module defines several FeatureApplicators.
These applicatiors are used to add features to the input data in the corresponding pipelines.
- class openstef.feature_engineering.feature_applicator.AbstractFeatureApplicator(horizons, feature_names=None, feature_modules=[])¶
Bases:
ABC
Defines the Applicator interface.
- abstract add_features(df, pj=None)¶
Adds features to an input DataFrame.
- Parameters:
df (
DataFrame
) – DataFrame with input data to which the features have to be addedpj (
PredictionJobDataClass
) – (Optional) A prediction job that is needed for location dependent features, if not specified a default location is used
- Return type:
DataFrame
- Returns:
Dataframe with added features.
- class openstef.feature_engineering.feature_applicator.OperationalPredictFeatureApplicator(horizons, feature_names=None, feature_modules=[])¶
Bases:
AbstractFeatureApplicator
Feature applicator for use in operational forecasts.
- add_features(df, pj=None)¶
Adds features to an input DataFrame.
This method is implemented specifically for an operational prediction pipeline and will add every available feature.
- Parameters:
df (
DataFrame
) – DataFrame with input data to which the features have to be addedpj (
PredictionJobDataClass
) – (Optional) A prediction job that is needed for location dependent features, if not specified a default location is used
- Return type:
DataFrame
- Returns:
Input DataFrame with an extra column for every added feature.
- class openstef.feature_engineering.feature_applicator.TrainFeatureApplicator(horizons, feature_names=None, feature_modules=[])¶
Bases:
AbstractFeatureApplicator
Feature applicator for use during training.
- add_features(df, pj=None, latency_config=None)¶
Adds features to an input DataFrame.
This method is implemented specifically for a model train pipeline. For larger horzions data is invalidated as when they are not available.
- For example:
For horzion 24 hours the feature T-720min is not added as the load 720 minutes ago is not available 24 hours in advance. In case of a horizon 0.25 hours this feature is added as in this case the feature is available.
- Parameters:
df (
DataFrame
) – Input data to which the features will be added.pj (
PredictionJobDataClass
) – (Optional) A prediction job that is needed for location dependent features, if not specified a default location is usedlatency_config (
dict
) – (Optional) Invalidate certain features that are not available for a specific horizon due to data latency. Defaults to{"APX": 24}
.
- Return type:
DataFrame
- Returns:
Input DataFrame with an extra column for every added feature and sorted on the datetime index.
openstef.feature_engineering.general module¶
This modelu contains various helper functions.
- openstef.feature_engineering.general.add_missing_feature_columns(input_data, features)¶
Adds feature column for features in the featurelist.
Add feature columns for features in the feature list if these columns don’t exist in the input data. If a column is added, its value is set to NaN. This is especially usefull to make sure the required columns are in place when making a prediction.
Note
This function is intended as a final check to prevent errors during predicion. In an ideal world this function is not nescarry.
- Parameters:
input_data (
DataFrame
) – DataFrame with input data and featurs.features (
list
[str
]) – List of requiered features.
- Return type:
DataFrame
- Returns:
Input dataframe with missing columns filled with
np.N=nan
.
- openstef.feature_engineering.general.enforce_feature_order(input_data)¶
Enforces correct order of features.
Alphabetically orders the feature columns. The load column remains the first column and the horizons column remains the last column. Everything in between is alphabetically sorted: The order eventually looks like this: [“load”] – [alphabetically sorted features] – [‘horizon’]
This function assumes the first column contains the to be predicted variable Furthermore the “horizon” is moved to the last position if it is pressent.
- Parameters:
input_data (
DataFrame
) – Input data with features.- Return type:
DataFrame
- Returns:
Properly sorted input data.
- openstef.feature_engineering.general.remove_non_requested_feature_columns(input_data, requested_features)¶
Removes features that are provided in the input data but not in the feature list.
This should not be nescesarry but serves as an extra failsave for making predicitons
- Parameters:
input_data (
DataFrame
) – DataFrame with featuresrequested_features (
list
[str
]) – List of reuqested features
- Return type:
DataFrame
- Returns:
Model input data with features.
openstef.feature_engineering.holiday_features module¶
This module contains all holiday related features.
- openstef.feature_engineering.holiday_features.check_for_bridge_day(date, holiday_name, country, years, holiday_functions, bridge_days)¶
Checks for bridgedays associated to a specific holiday with date (date).
Any found bridgedays are appende dto the bridgedays list. Also a specific feature function for the bridgeday is added to the general holidayfuncitons dictionary.
- Parameters:
date (
datetime
) – Date of holiday to check for associated bridgedays.holiday_name (
str
) – Name of the holiday.country (
str
) – Country for which to detect the bridgedays.years (
list
) – List of years for which to detect bridgedays.holiday_functions (
dict
) – Dictionary to which the featurefunction has to be appended to in case of a bridgeday.bridge_days (
list
) – List of bridgedays to which any found bridgedays have to be appended.
- Return type:
tuple
[dict
,list
]- Returns:
Dict with holiday feature functions
List of bridgedays
- openstef.feature_engineering.holiday_features.generate_holiday_feature_functions(country_code='NL', years=None, path_to_school_holidays_csv=PosixPath('/home/runner/work/openstef/openstef/openstef/data/dutch_holidays.csv'))¶
Generates functions for creating holiday feature.
This improves forecast accuracy. Examples of features that are added are: 2020-01-01 is ‘Nieuwjaarsdag’.
2022-12-24 - 2023-01-08 is the ‘Kerstvakantie’ 2022-10-15 - 2022-10-23 is the ‘HerfstvakantieNoord’
The holidays are based on a manually generated csv file. The information is collected using: https://www.schoolvakanties-nederland.nl/ and the python holiday function The official following official ducth holidays are included untill 2023:
Kerstvakantie
Meivakantie
Herstvakantie
Bouwvak
Zomervakantie
Voorjaarsvakantie
Nieuwjaarsdag
Pasen
Koningsdag
Hemelvaart
Pinksteren
Kerst
The ‘Brugdagen’ are updated untill dec 2020. (Generated using agenda)
- Parameters:
country – Country for which to create holiday features.
years (
list
) – years for which to create holiday features.path_to_school_holidays_csv (
str
) – Filepath to csv with school holidays.NOTE – Dutch holidays csv file is only until January 2026.
- Return type:
dict
- Returns:
Dictionary with functions that check if a given date is a holiday, keys consist of “Is” + the_name_of_the_holiday_to_be_checked
openstef.feature_engineering.lag_features module¶
This module contains all lag features.
- openstef.feature_engineering.lag_features.extract_lag_features(feature_names, horizon=24.0)¶
Creates a list of lag minutes and a list of lag days that were used during the training of the input model.
- Parameters:
feature_names (
list
[str
]) – All requested lag featureshorizon (
float
) – Forecast horizon limit in hours.
- Return type:
tuple
[list
,list
]- Returns:
List of minute lags that were used as features during training.
List of days lags that were used as features during training.
- openstef.feature_engineering.lag_features.generate_lag_feature_functions(feature_names=None, horizon=24.0)¶
Creates functions to generate lag features in a dataset.
- Parameters:
feature_names (
list
[str
]) – minute lagtimes that where used during training of the model. If empty a new set will be automatically generated.horizon (
float
) – Forecast horizon limit in hours.
- Return type:
dict
- Returns:
Lag functions.
Example:
lag_functions = generate_lag_functions(data,minute_list,h_ahead)
- openstef.feature_engineering.lag_features.generate_non_trivial_lag_times(data, height_threshold=0.1)¶
Calculate an autocorrelation curve of the load trace.
This curve is subsequently used to add additional lag times as features.
- Parameters:
data (
DataFrame
) – Dataframe with input data in the form pd.DataFrame(index = datetime, columns = [label, predictor_1,…, predictor_n])height_threshold (
float
) – Minimal autocorrelation value to be recognized as a peak.
- Return type:
list
[int
]- Returns:
Aditional non-trivial minute lags
- openstef.feature_engineering.lag_features.generate_trivial_lag_features(horizon)¶
Generates relevant lag times for lag feature function creation.
This function is mostly used during training of models and not during predicting.
- Parameters:
horizon (
float
) – Forecast horizon limit in hours.- Return type:
tuple
[list
,list
]- Returns:
List of minute lags that were used as features during training.
List of days lags that were used as features during training.
openstef.feature_engineering.missing_values_transformer module¶
- class openstef.feature_engineering.missing_values_transformer.MissingValuesTransformer(missing_values=nan, imputation_strategy=None, fill_value=None, no_fill_future_values_features=None)¶
Bases:
object
MissingValuesTransformer handles missing values in data by imputing them with a given strategy.
It also removes columns that are always null from the data.
- fit(x, y=None)¶
Fit the imputer on the input data.
- fit_transform(x, y=None)¶
Fit the imputer on the input data and transform it.
- Return type:
tuple
[DataFrame
,Optional
[Series
]]- Returns:
The data with missing values imputed.
- in_feature_names: List[str] | None = None¶
- non_null_feature_names: List[str] = None¶
- transform(x)¶
Transform the input data by imputing missing values.
- Return type:
DataFrame
openstef.feature_engineering.weather_features module¶
This module contains all wheather related functions used for feature engineering.
- openstef.feature_engineering.weather_features.add_additional_solar_features(data, pj=None, feature_names=None)¶
Adds additional solar features to the input data.
- Parameters:
data (
DataFrame
) – Dataframe to which the solar features have to be addedpj (
PredictionJobDataClass
) – prediction job which should at least contain the latitude and longitude location.feature_names (
list
[str
]) – List of requested features
- Return type:
DataFrame
- Returns:
DataFrame same as input dataframe with extra columns for the added solar features
- openstef.feature_engineering.weather_features.add_additional_wind_features(data, feature_names=None)¶
Adds additional wind features to the input data.
- Parameters:
data (
DataFrame
) – Dataframe to which the wind features have to be addedfeature_names (
list
[str
]) – List of requested features
- Return type:
DataFrame
- Returns:
DataFrame same as input dataframe with extra columns for the added wind features
- openstef.feature_engineering.weather_features.add_humidity_features(data, feature_names=None)¶
Adds humidity features to the input dataframe.
These features are calculated using functions defines in this module. A list of requested features is used to determine whether to add the humidity features or not.
- Parameters:
data (
DataFrame
) – Input dataframe to which features have to be addedfeature_names (
list
[str
]) – list of requested features.
- Return type:
DataFrame
- Returns:
Same as input dataframe with extra columns for the humidty features.
- openstef.feature_engineering.weather_features.calc_air_density(temperature, pressure, rh)¶
Calculates the dewpoint.
- Parameters:
temperature (
Union
[float
,ndarray
]) – The temperature in Cpressure (
Union
[float
,ndarray
]) – the atmospheric pressure in Parh (
Union
[float
,ndarray
]) – Relative humidity
- Return type:
Union
[float
,ndarray
]- Returns:
The air density (kg/m^3)
- openstef.feature_engineering.weather_features.calc_dewpoint(vapour_pressure)¶
Calculates the dewpoint, see https://en.wikipedia.org/wiki/Dew_point for mroe info.
- Parameters:
vapour_pressure (
Union
[float
,ndarray
]) – The vapour pressure for which the dewpoint should be calculated- Return type:
Union
[float
,ndarray
]- Returns:
Dewpoint
- openstef.feature_engineering.weather_features.calc_saturation_pressure(temperature)¶
Calculate the water vapour pressure from the temperature.
See https://www.vaisala.com/sites/default/files/documents/Humidity_Conversion_Formulas_B210973EN-F.pdf.
- Parameters:
temperature (
Union
[float
,ndarray
]) – Temperature in C- Return type:
Union
[float
,ndarray
]- Returns:
The saturation pressure of water at the respective temperature
- openstef.feature_engineering.weather_features.calc_vapour_pressure(rh, psat)¶
Calculates the vapour pressure.
- Parameters:
rh (
Union
[float
,ndarray
]) – Relative humiditypsat (
Union
[float
,ndarray
]) – Saturation pressure: see calc_saturation_pressure
- Return type:
Union
[float
,ndarray
]- Returns:
The water vapour pressure
- openstef.feature_engineering.weather_features.calculate_dni(radiation, pj)¶
Calculate the direct normal irradiance (DNI).
This function uses the predicted radiation and information derived from the location (obtained from pj)
- Parameters:
radiation (
Series
) – predicted radiation including DatetimeIndex with right time-zonepj (
PredictionJobDataClass
) – PredictJob including information about the location (lat, lon)
- Return type:
Series
- Returns:
Direct normal irradiance (DNI).
- openstef.feature_engineering.weather_features.calculate_gti(radiation, pj, surface_tilt=34.0, surface_azimuth=180)¶
Calculate the GTI/POA using the radiation.
This function assumes Global Tilted Irradiance (GTI) = Plane of Array (POA)
- Parameters:
radiation (
Series
) – pandas series with DatetimeIndex with right timezone informationpj (
PredictionJobDataClass
) – prediction job which should at least contain the latitude and longitude location.surface_tilt (
float
) – The tilt of the surface of, for example, your PhotoVoltaic-system.surface_azimuth (
float
) – The way the surface is facing. South facing 180 degrees, North facing 0 degrees, East facing 90 degrees and West facing 270 degrees
- Return type:
Series
- Returns:
Global Tilted Irradiance (GTI)
- openstef.feature_engineering.weather_features.calculate_windspeed_at_hubheight(windspeed, fromheight=10.0, hub_height=100.0)¶
Calculate windspeed at hubheight.
Calculates the windspeed at hubheigh by extrapolation from a given height to a given hub height using the wind power law https://en.wikipedia.org/wiki/Wind_profile_power_law
- Parameters:
windspeed (
Union
[float
,Series
]) – float OR pandas series of windspeed at height = heightfromheight (
float
) – height (m) of the windspeed data. Default is 10mhubheight – height (m) of the turbine
- Return type:
Series
- Returns:
Windspeed at hubheight.
- openstef.feature_engineering.weather_features.calculate_windturbine_power_output(windspeed, n_turbines=1, turbine_data=None)¶
Calculate wind turbine power output.
These values are related through the power curve, which is described by turbine_data. If no turbine_data is given, default values are used and results are normalized to 1MWp. If n_turbines=0, the result is normalized to a rated power of 1.
- Parameters:
windspeed (
Series
) – pd.DataFrame(index = datetime, columns = [“windspeedHub”])nTurbines – The number of turbines
turbineData – slope_center, rated_power, steepness
- Return type:
Series
- Returns:
pd.DataFrame(index = datetime, columns = [“forecast”])
- openstef.feature_engineering.weather_features.humidity_calculations(temperature, rh, pressure)¶
Function that calculates weather features based on humidity..
- These features are:
Saturation pressure
Vapour pressure
Dewpoint
Air density
- Parameters:
temperature (
Union
[float
,ndarray
]) – Temperature in Crh (
Union
[float
,ndarray
]) – Relative humidity in %pressure (
Union
[float
,ndarray
]) – The air pressure in hPa
- Return type:
Union
[dict
,ndarray
]- Returns:
If the input is an np.ndarray; a pandas dataframe with the calculated moisture indices, if the input is numeric; a dict with the calculated moisture indices