๐ฎ Forecasting with OpenSTEF 4.0 Workflow Presets#
This tutorial demonstrates how to use OpenSTEF 4.0 to create energy load forecasts using the Workflow Presets pattern. Youโll learn how to:
Load real-world energy data from the Liander 2024 benchmark dataset
Configure a forecasting workflow with weather features and prediction quantiles
Train a model and inspect its performance
Generate probabilistic forecasts with confidence intervals
Visualize results and explain feature importance
OpenSTEF (Short-Term Energy Forecasting) is a modular library for creating accurate energy forecasts in the power grid domain.
# --- Setup: Logging and Display Configuration ---
from typing import Any, cast
from openstef_core.testing import configure_notebook_display, setup_notebook_logging
configure_notebook_display()
logger = setup_notebook_logging(__name__)
๐ฆ Step 1: Download the Dataset#
Weโll use the Liander 2024 Energy Forecasting Benchmark dataset from HuggingFace Hub. This dataset contains:
Load measurements โ historical energy consumption from various installations (mv feeders, transformers, etc.)
Weather forecasts โ versioned weather predictions (temperature, radiation, wind, etc.)
EPEX prices โ day-ahead electricity market prices
Profiles โ typical daily/weekly load patterns
# Download and combine the Liander benchmark dataset into a single TimeSeriesDataset.
from openstef_core.testing import load_liander_dataset, prepare_tutorial_datasets
dataset = load_liander_dataset()
print(f"Dataset shape: {dataset.data.shape}")
print(f"Date range: {dataset.data.index.min()} to {dataset.data.index.max()}")
dataset.data.head()
โ๏ธ Step 3: Split Data into Training and Forecast Periods#
Weโll use:
90 days of historical data for training
14 days as the forecast period (where weโll generate predictions)
# Split the dataset into training (90 days) and forecast (14 days) periods.
train_dataset, forecast_dataset = prepare_tutorial_datasets()
# Visualize the training data
# The plot shows the 'load' column (energy consumption in MW) over time
# cast() needed: pandas returns plotly Figure at runtime (backend="plotly") but typed as Axes
fig = cast(Any, train_dataset.data[["load"]].plot(title="Training Data: Energy Load over Time"))
fig.update_layout(yaxis_title="Load (MW)", xaxis_title="Time")
fig.show()
โ๏ธ Step 4: Configure the Forecasting Workflow#
OpenSTEF uses a ForecastingWorkflowConfig to define all aspects of the forecasting pipeline:
Model type โ
gblinear(gradient boosted linear model) orxgboostForecast horizons โ how far ahead to predict (e.g., 36 hours)
Quantiles โ prediction intervals for probabilistic forecasts
Feature columns โ which weather variables to use
The GBLinear model is particularly good for energy forecasting because:
It can extrapolate beyond training data (important for rare events)
It provides interpretable feature importance
Itโs fast to train and predict
# Import workflow components
from openstef_core.types import LeadTime, Q # LeadTime: forecast horizon, Q: quantile
from openstef_models.presets import ForecastingWorkflowConfig, create_forecasting_workflow
from openstef_models.presets.forecasting_workflow import GBLinearForecaster
# Configure the forecasting workflow
workflow = create_forecasting_workflow(
config=ForecastingWorkflowConfig(
# Model identification
model_id="gblinear_demo_v1",
model="gblinear", # Use gradient boosted linear model
# Forecast settings
horizons=[LeadTime.from_string("PT36H")], # Predict up to 36 hours ahead
quantiles=[Q(0.5), Q(0.1), Q(0.9)], # Median + 80% prediction interval
# Target column (what we're predicting)
target_column="load",
# Weather feature columns (from the dataset)
temperature_column="temperature_2m",
relative_humidity_column="relative_humidity_2m",
wind_speed_column="wind_speed_10m",
radiation_column="shortwave_radiation", # Solar radiation
pressure_column="surface_pressure",
# Training settings
verbosity=1, # Show progress during training
mlflow_storage=None, # Disable MLflow tracking for this demo
# Model-specific hyperparameters
gblinear_hyperparams=GBLinearForecaster.HyperParams(
n_steps=50 # Number of boosting iterations
),
)
)
print("โ
Workflow configured successfully!")
๐๏ธ Step 5: Train the Model#
The workflowโs fit() method handles the entire training pipeline:
Preprocessing โ feature engineering, data validation, scaling
Training โ fit the model on historical data
Evaluation โ compute metrics on training data
# Train the model on historical data
logger.info("๐๏ธ Starting model training...")
result = workflow.fit(train_dataset)
# Display training metrics
if result is not None:
logger.info("โ
Training complete!")
print("\n๐ Training Evaluation Metrics:")
print(result.metrics_full.to_dataframe())
if result.metrics_test is not None:
print("\n๐ Test Set Metrics (held-out validation):")
print(result.metrics_test.to_dataframe())
๐ฎ Step 6: Generate Forecasts#
Now we use the trained model to predict energy load for the next 14 days. The output is a ForecastDataset containing:
Median prediction (
quantile_P50)Lower bound (
quantile_P10) โ 10th percentileUpper bound (
quantile_P90) โ 90th percentile
# Generate probabilistic forecasts for the forecast period
from openstef_core.datasets import ForecastDataset
logger.info("๐ฎ Generating forecasts...")
forecast: ForecastDataset = workflow.predict(forecast_dataset)
# Display forecast summary
print(f"\n๐ Forecast generated for {len(forecast.data)} timestamps")
print(f"๐ Quantiles: {forecast.quantiles}")
print("\n๐ Last 5 forecast values:")
print(forecast.data.tail())
๐ Step 7: Visualize Forecast Results#
OpenSTEF-BEAM provides ForecastTimeSeriesPlotter for beautiful interactive visualizations:
Actual measurements shown as a line
Forecast median shown as another line
Prediction intervals shown as shaded areas
# Create an interactive forecast visualization
from openstef_beam.analysis.plots import ForecastTimeSeriesPlotter
fig = (
ForecastTimeSeriesPlotter()
# Add actual measurements (ground truth)
.add_measurements(measurements=forecast_dataset.data["load"])
# Add model predictions with confidence bands
.add_model(
model_name="GBLinear",
forecast=forecast.median_series, # P50 prediction
quantiles=forecast.quantiles_data, # P10-P90 confidence band
)
.plot()
)
# Update layout for better presentation
fig.update_layout(
title="๐ฎ Energy Load Forecast vs Actual",
yaxis_title="Load (MW)",
xaxis_title="Time",
height=500,
)
fig.show()
๐ Step 8: Explain Feature Importance#
Understanding why the model makes certain predictions is crucial for trust and debugging. GBLinear models provide clear feature importance rankings.
# Visualize feature importance using the ExplainableForecaster interface
from typing import cast
from openstef_models.explainability import ExplainableForecaster
from openstef_models.models.forecasting_model import ForecastingModel
# The GBLinear model implements ExplainableForecaster, providing feature importance
forecaster = cast(ForecastingModel, workflow.model).forecaster
explainable_model = cast(ExplainableForecaster, forecaster)
# Create an interactive treemap of feature importances
# Larger boxes = more important features
fig = explainable_model.plot_feature_importances()
fig.update_layout(title="๐ Feature Importance Treemap")
fig.show()
๐ฌ Step 9: Visualize Feature Contributions (SHAP)#
While feature importance shows which features matter overall, contributions
show how each feature pushed the prediction up or down for every individual timestep.
GBLinear models expose these as SHAP values via predict_contributions().
# Compute per-timestep feature contributions for the forecast period
from openstef_models.explainability import ContributionsPlotter
contributions = workflow.model.predict_contributions(forecast_dataset)
# Heatmap: contributions over time with prediction line
ContributionsPlotter.plot_heatmap(contributions, top_n=10, show_prediction=True).show()
# Waterfall: decompose a single timestep's prediction
ContributionsPlotter.plot_waterfall(contributions, timestep=0, top_n=10).show()
# Bar chart: mean absolute contribution per feature
ContributionsPlotter.plot_bar(contributions, top_n=10).show()
๐ฏ Summary#
In this tutorial, you learned how to:
โ Load energy data from the Liander 2024 benchmark dataset
โ Configure a workflow with
ForecastingWorkflowConfigโ Train a GBLinear model for probabilistic forecasting
โ Generate forecasts with confidence intervals
โ Visualize results and feature importance
๐ Next Steps#
Try different models:
"xgboost"for more complex patternsExperiment with more quantiles for narrower prediction intervals
Use the backtesting notebook to evaluate model performance systematically
Explore MLflow integration for experiment tracking