openstef.pipeline package¶

Submodules¶

openstef.pipeline.create_basecase_forecast module¶

openstef.pipeline.create_basecase_forecast.create_basecase_forecast_pipeline(pj, input_data)¶

Compute the base case forecast and confidence intervals for a given prediction job and input data.

Parameters:

pj (PredictionJobDataClass) – Prediction job
input_data (DataFrame) – data frame containing the input data necessary for the prediction.

Return type:

DataFrame

Returns:

Base case forecast

Raises:

NoRealisedLoadError – When no realised load for given datetime range.

openstef.pipeline.create_basecase_forecast.generate_basecase_confidence_interval(data_with_features)¶

Calculate confidence interval for a basecase forecast.

Parameters:: data_with_features (DataFrame) – Input dataframe that is used to make the basecase forecast.
Return type:: DataFrame
Returns:: Dataframe with the confidence interval.

openstef.pipeline.create_component_forecast module¶

openstef.pipeline.create_component_forecast.create_components_forecast_pipeline(pj, input_data, weather_data)¶

Pipeline for creating a component forecast using Dazls prediction model.

Parameters:

pj (PredictionJobDataClass) – Prediction job
input_data (DataFrame) – Input forecast for the components forecast.
weather_data (DataFrame) – Weather data with ‘radiation’ and ‘windspeed_100m’ columns

Return type:

DataFrame

Returns:

DataFrame with component forecasts. The dataframe contains these columns; “forecast_wind_on_shore”, “forecast_solar”, “forecast_other”, “pid”, “customer”, “description”, “type”, “algtype”

openstef.pipeline.create_component_forecast.create_input(pj, input_data, weather_data)¶

This function prepares the input data.

This data will be used for the Dazls model prediction, so they will be according Dazls model requirements.

Parameters:

pj (PredictionJobDataClass) – Prediction job
input_data (DataFrame) – Input forecast for the components forecast.
weather_data (DataFrame) – Weather data with ‘radiation’ and ‘windspeed_100m’ columns

Return type:

DataFrame

Returns:

It outputs a dataframe which will be used for the Dazls prediction function.

openstef.pipeline.create_forecast module¶

openstef.pipeline.create_forecast.create_forecast_pipeline(pj, input_data, mlflow_tracking_uri)¶

Create forecast pipeline.

This is the top-level pipeline which included loading the most recent model for the given prediction job.

Expected prediction job keys: “id”,

Parameters:

pj (PredictionJobDataClass) – Prediction job
input_data (DataFrame) – Training input data (without features)
mlflow_tracking_uri (str) – MlFlow tracking URI

Return type:

DataFrame

Returns:

DataFrame with the forecast

Raises:

InputDataOngoingFlatlinerError – When all recent load measurements are constant.
LookupError – When no model is found for the given prediction job in MLflow.

openstef.pipeline.create_forecast.create_forecast_pipeline_core(pj, input_data, model, model_specs)¶

Create forecast pipeline (core).

Computes the forecasts and confidence intervals given a prediction job and input data. This pipeline has no database or persisitent storage dependencies.

Expected prediction job keys: “resolution_minutes”, “id”, “type”,: “name”, “quantiles”

Parameters:

pj (PredictionJobDataClass) – Prediction job.
input_data (DataFrame) – Input data for the prediction.
model (OpenstfRegressor) – Model to use for this prediction.
model_specs (ModelSpecificationDataClass) – Model specifications.

Return type:

DataFrame

Returns:

Forecast

Raises:

InputDataOngoingFlatlinerError – When all recent load measurements are constant.

openstef.pipeline.optimize_hyperparameters module¶

openstef.pipeline.optimize_hyperparameters.optimize_hyperparameters_pipeline(pj, input_data, mlflow_tracking_uri, artifact_folder, n_trials=100)¶

Optimize hyperparameters pipeline.

Expected prediction job key’s: “name”, “model”

Parameters:

pj (PredictionJobDataClass) – Prediction job
input_data (DataFrame) – Raw training input data
mlflow_tracking_uri (str) – Path/Uri to mlflow service
artifact_folder (str) – Path where artifacts, such as trained models, are stored
horizons – horizons for feature engineering.
n_trials (int) – The number of trials. Defaults to N_TRIALS.

Raises:

ValueError – If the input_date is insufficient.
InputDataInsufficientError – If the input dataframe is empty.
InputDataWrongColumnOrderError – If the load column is missing in the input dataframe.
OldModelHigherScoreError – When old model is better than new model.

Return type:

dict

Returns:

Optimized hyperparameters.

openstef.pipeline.optimize_hyperparameters.optimize_hyperparameters_pipeline_core(pj, input_data, horizons=[0.25, 47.0], n_trials=100)¶

Optimize hyperparameters pipeline core.

Expected prediction job key’s: “name”, “model”

Parameters:

pj (PredictionJobDataClass) – Prediction job
input_data (DataFrame) – Raw training input data
horizons (list[float]) – horizons for feature engineering in hours.
n_trials (int) – The number of trials. Defaults to N_TRIALS.

Raises:

ValueError – If the input_date is insufficient.
InputDataInsufficientError – If the input dataframe is empty.
InputDataWrongColumnOrderError – If the load column is missing in the input dataframe.
OldModelHigherScoreError – When old model is better than new model.
InputDataOngoingFlatlinerError – If all recent load measurements are constant.

Return type:

tuple[OpenstfRegressor, ModelSpecificationDataClass, Report, dict, int, dict[str, Any]]

Returns:

Best model,
Model specifications of the best model,
Report of the best training round,
Trials,
Best trial number,
Optimized hyperparameters.

openstef.pipeline.optimize_hyperparameters.optuna_optimization(pj, objective, validated_data_with_features, n_trials)¶

Perform hyperparameter optimization with optuna.

Parameters:

pj (PredictionJobDataClass) – Prediction job
objective (RegressorObjective) – Objective function for optuna
validated_data_with_features (DataFrame) – cleaned input dataframe
n_trials (int) – number of optuna trials

Return type:

tuple[Study, RegressorObjective]

Returns:

Optimization study from optuna
The objective object used by optuna

openstef.pipeline.train_create_forecast_backtest module¶

openstef.pipeline.train_create_forecast_backtest.train_model_and_forecast_back_test(pj, modelspecs, input_data, training_horizons=None, n_folds=1)¶

Pipeline for a back test.

When number of folds is larger than 1: apply pipeline for a back test when forecasting the entire input range.

Makes use of kfold cross validation in order to split data multiple times.

Results of all the testsets are added together to obtain the forecast for the whole input range.

Obtaining the days for each fold can be done either randomly or not

DO NOT USE THIS PIPELINE FOR OPERATIONAL FORECASTS

Parameters:

pj (PredictionJobDataClass) – Prediction job.
modelspecs (ModelSpecificationDataClass) – Dataclass containing model specifications
input_data (DataFrame) – Input data
training_horizons (list[float]) – horizons to train on in hours. These horizons are also used to make predictions (one for every horizon)
n_folds (int) – number of folds to apply (if 1, no cross validation will be applied)

Return type:

tuple[DataFrame, list[OpenstfRegressor], list[DataFrame], list[DataFrame], list[DataFrame]]

Returns:

Forecast (pandas.DataFrame)
Fitted models (list[OpenStfRegressor])
Train data sets (list[pd.DataFrame])
Validation data sets (list[pd.DataFrame])
Test data sets (list[pd.DataFrame])

Raises:

InputDataInsufficientError – when input data is insufficient.
InputDataWrongColumnOrderError – when input data has a invalid column order.
ValueError – when the horizon is a string and the corresponding column in not in the input data
InputDataOngoingFlatlinerError – If all recent load measurements are constant.

openstef.pipeline.train_create_forecast_backtest.train_model_and_forecast_test_core(pj, modelspecs, train_data, validation_data, test_data)¶

Trains the model and forecast on the test set.

Parameters:

pj (PredictionJobDataClass) – Prediction job.
modelspecs (ModelSpecificationDataClass) – Dataclass containing model specifications
train_data (DataFrame) – Train data with computed features
validation_data (DataFrame) – Validation data with computed features
test_data (DataFrame) – Test data with computed features

Return type:

tuple[OpenstfRegressor, DataFrame]

Returns:

The trained model
The forecast on the test set.

Raises:

NotImplementedError – When using invalid model type in the prediction job.
InputDataWrongColumnOrderError – When ‘load’ column is not first and ‘horizon’ column is not last.

openstef.pipeline.train_model module¶

openstef.pipeline.train_model.train_model_pipeline(pj, input_data, check_old_model_age, mlflow_tracking_uri, artifact_folder, ignore_existing_models=False)¶

Middle level pipeline that takes care of all persistent storage dependencies.

Expected prediction jobs keys: “id”, “model”, “hyper_params”, “feature_names”.

Parameters:

pj (PredictionJobDataClass) – Prediction job
input_data (DataFrame) – Raw training input data
check_old_model_age (bool) – Check if training should be skipped because the model is too young
mlflow_tracking_uri (str) – Tracking URI for MLFlow
artifact_folder (str) – Path where artifacts, such as trained models, are stored
ignore_existing_models (bool) – If True, a new model is trained as if no old model exists.

Return type:

Optional[tuple[DataFrame, DataFrame, DataFrame]]

Returns:

If pj.save_train_forecasts is False, None is returned Otherwise:

The train dataset with forecasts

The validation dataset with forecasts

The test dataset with forecasts

Raises:

InputDataInsufficientError – when input data is insufficient.
InputDataWrongColumnOrderError – when input data has a invalid column order. ‘load’ column should be first and ‘horizon’ column last.
OldModelHigherScoreError – When old model is better than new model.
SkipSaveTrainingForecasts – If old model is better or younger than MAXIMUM_MODEL_AGE, the model is not saved.

openstef.pipeline.train_model.train_model_pipeline_core(pj, model_specs, input_data, old_model=None, horizons=[0.25, 47.0], ignore_existing_models=False)¶

Train model core pipeline.

Trains a new model given a prediction job, input data and compares it to an old model. This pipeline has no database or persistent storage dependencies.

Parameters:

pj (PredictionJobDataClass) – Prediction job
model_specs (ModelSpecificationDataClass) – Dataclass containing model specifications
input_data (DataFrame) – Input data
old_model (OpenstfRegressor) – Old model to compare to. Defaults to None.
horizons (list[float]) – Horizons to train on in hours, relevant for feature engineering.
ignore_existing_models (bool) – If True, all existing models, including, hyperparameters are ignored and defsault values are used.

Raises:

InputDataInsufficientError – when input data is insufficient.
InputDataWrongColumnOrderError – when input data has a invalid column order.
OldModelHigherScoreError – When old model is better than new model.
InputDataOngoingFlatlinerError – If all recent load measurements are constant.

Return type:

Tuple[OpenstfRegressor, Report, ModelSpecificationDataClass, tuple[DataFrame, DataFrame, DataFrame]]

Returns:

Fitted_model (OpenstfRegressor)
Report (Report)
Modelspecs (ModelSpecificationDataClass)
Datasets (tuple[pd.DataFrmae, pd.DataFrame, pd.Dataframe): The train, validation and test sets

openstef.pipeline.train_model.train_pipeline_common(pj, model_specs, input_data, horizons, test_fraction=0.0, backtest=False, test_data_predefined=Empty DataFrame Columns: [] Index: [])¶

Common pipeline shared with operational training and backtest training.

Parameters:

pj (PredictionJobDataClass) – Prediction job
model_specs (ModelSpecificationDataClass) – Dataclass containing model specifications
input_data (DataFrame) – Input data
horizons (list[float]) – horizons to train on in hours.
test_fraction (float) – fraction of data to use for testing
backtest (bool) – boolean if we need to do a backtest
test_data_predefined (DataFrame) – Predefined test data frame to be used in the pipeline (empty data frame by default)

Return type:

tuple[OpenstfRegressor, Report, DataFrame, DataFrame, DataFrame, DataFrame]

Returns:

The trained model
Report
The train data
The validation data
The test data

Raises:

InputDataInsufficientError – when input data is insufficient.
InputDataWrongColumnOrderError – when input data has a invalid column order. ‘load’ column should be first and ‘horizon’ column last.
InputDataOngoingFlatlinerError – If all recent load measurements are constant.

openstef.pipeline.train_model.train_pipeline_step_compute_features(pj, model_specs, input_data, horizons=list[float])¶

Compute features and perform consistency checks.

Parameters:

pj (PredictionJobDataClass) – Prediction job
model_specs (ModelSpecificationDataClass) – Dataclass containing model specifications
input_data (DataFrame) – Input data
horizons – horizons to train on in hours.

Return type:

DataFrame

Returns:

The dataframe with features need to train the model

Raises:

InputDataInsufficientError – when input data is insufficient.
InputDataWrongColumnOrderError – when input data has a invalid column order.
ValueError – when the horizon is a string and the corresponding column in not in the input data
InputDataOngoingFlatlinerError – If all recent load measurements are constant.

openstef.pipeline.train_model.train_pipeline_step_load_model(pj, serializer, ignore_existing_models=False)¶

Return type:: Tuple[OpenstfRegressor, ModelSpecificationDataClass, Union[int, float]]

openstef.pipeline.train_model.train_pipeline_step_split_data(data_with_features, pj, test_fraction, backtest=False, test_data_predefined=Empty DataFrame Columns: [] Index: [])¶

The default way to perform train, val, test split.

Parameters:

data_with_features (DataFrame) – Input data
pj (PredictionJobDataClass) – Prediction job
test_fraction (float) – fraction of data to use for testing
backtest (bool) – boolean if we need to do a backtest
test_data_predefined (DataFrame) – Predefined test data frame to be used in the pipeline (empty data frame by default)

Return type:

Tuple[DataFrame, DataFrame, DataFrame, DataFrame]

Returns:

Train dataset
Validation dataset
Test dataset

openstef.pipeline.train_model.train_pipeline_step_train_model(pj, model_specs, train_data, validation_data)¶

Train the model.

Parameters:

pj (PredictionJobDataClass) – Prediction job
model_specs (ModelSpecificationDataClass) – Dataclass containing model specifications
train_data (DataFrame) – The training data
validation_data (DataFrame) – The test data

Return type:

OpenstfRegressor

Returns:

The trained model

Raises:

NotImplementedError – When using invalid model type in the prediction job.
InputDataWrongColumnOrderError – When ‘load’ column is not first and ‘horizon’ column is not last.

openstef.pipeline.utils module¶

openstef.pipeline.utils.generate_forecast_datetime_range(forecast_data)¶

Generate forecast range based on last cluster of null values in first target column of forecast data.

Example

A forecast dataset with data between 2021-11-05 and 2021-11-19, and the target column ‘load’ as first column is given as input to this function. The first column ‘load’ has null values between 2021-11-17 04:00:00 and 2021-11-19 05:00:00. The null values at the end of the column indicate when forecasts are needed. Therefore this function sets starting time of forecasts as 2021-11-17 04:00:00 and end time of forecasts as 2021-11-19 05:00:00.

Parameters:: forecast_data (DataFrame) – The forecast dataframe.
Return type:: tuple[datetime, datetime]
Returns:: Start and end datetimes of the forecast range.
Raises:: ValueError – If the target column does not have null values.

openstef.pipeline package¶

Submodules¶

openstef.pipeline.create_basecase_forecast module¶

openstef.pipeline.create_component_forecast module¶

openstef.pipeline.create_forecast module¶

openstef.pipeline.optimize_hyperparameters module¶

openstef.pipeline.train_create_forecast_backtest module¶

openstef.pipeline.train_model module¶

openstef.pipeline.utils module¶

Module contents¶