openstef_beam.benchmarking#

Runs complete model comparison studies across multiple forecasting targets.

Comparing forecasting models properly requires testing them on many different forecasting scenarios (equipment types, consumption/prosumption, solar/wind parks, regions, seasons). This module automates the entire process: training models, running backtests, calculating metrics, generating reports, and storing results for comparison.

The complete workflow:
  • Model training: Train different forecasting approaches on each target

  • Backtesting: Test all models under realistic conditions

  • Evaluation: Calculate performance metrics across different scenarios

  • Analysis: Generate comparison reports and visualizations

  • Storage: Save results for later analysis and sharing

Functions#

read_evaluation_reports(targets, storage, ...)

Load evaluation reports for multiple targets from storage.

Classes#

BenchmarkCallback()

Base class for benchmark execution callbacks.

BenchmarkCallbackManager([callbacks])

Group of callbacks that can be used to aggregate multiple callbacks.

BenchmarkComparisonPipeline(analysis_config, ...)

Pipeline for comparing results across multiple benchmark runs.

BenchmarkContext(**data)

Context information passed to forecaster factories during benchmark execution.

BenchmarkPipeline(backtest_config, ...[, ...])

Orchestrates forecasting model benchmarks across multiple targets.

BenchmarkStorage()

Abstract base class for storing and retrieving benchmark results.

BenchmarkTarget(**data)

Base class for benchmark targets with common properties.

InMemoryBenchmarkStorage()

In-memory implementation of BenchmarkStorage for testing and temporary use.

LocalBenchmarkStorage(base_path, *[, ...])

File system-based storage implementation for benchmark results.

S3BenchmarkStorage(local_storage, bucket_name)

S3-backed storage implementation that combines local and cloud storage.

SimpleTargetProvider(**data)

File-based target provider loading from YAML configs and Parquet datasets.

StrictExecutionCallback()

Callback to ensure strict benchmark execution with immediate error termination.

TargetProvider(**data)

Abstract interface for loading benchmark targets and their associated datasets.

TargetProviderConfig(**data)

Configuration specifying data locations and path templates for target providers.