openstef_beam.benchmarking#

Runs complete model comparison studies across multiple forecasting targets.

Comparing forecasting models properly requires testing them on many different forecasting scenarios (equipment types, consumption/prosumption, solar/wind parks, regions, seasons). This module automates the entire process: training models, running backtests, calculating metrics, generating reports, and storing results for comparison.

The complete workflow:

Model training: Train different forecasting approaches on each target
Backtesting: Test all models under realistic conditions
Evaluation: Calculate performance metrics across different scenarios
Analysis: Generate comparison reports and visualizations
Storage: Save results for later analysis and sharing

Functions#

read_evaluation_reports(targets, storage, ...)

Load evaluation reports for multiple targets from storage.

Classes#

`BenchmarkCallback`()	Base class for benchmark execution callbacks.
`BenchmarkCallbackManager`([callbacks])	Group of callbacks that can be used to aggregate multiple callbacks.
`BenchmarkComparisonPipeline`(analysis_config, ...)	Pipeline for comparing results across multiple benchmark runs.
`BenchmarkContext`(**data)	Context information passed to forecaster factories during benchmark execution.
`BenchmarkPipeline`(backtest_config, ...[, ...])	Orchestrates forecasting model benchmarks across multiple targets.
`BenchmarkStorage`()	Abstract base class for storing and retrieving benchmark results.
`BenchmarkTarget`(**data)	Base class for benchmark targets with common properties.
`InMemoryBenchmarkStorage`()	In-memory implementation of BenchmarkStorage for testing and temporary use.
`LocalBenchmarkStorage`(base_path, *[, ...])	File system-based storage implementation for benchmark results.
`S3BenchmarkStorage`(local_storage, bucket_name)	S3-backed storage implementation that combines local and cloud storage.
`SimpleTargetProvider`(**data)	File-based target provider loading from YAML configs and Parquet datasets.
`StrictExecutionCallback`()	Callback to ensure strict benchmark execution with immediate error termination.
`TargetProvider`(**data)	Abstract interface for loading benchmark targets and their associated datasets.
`TargetProviderConfig`(**data)	Configuration specifying data locations and path templates for target providers.