benchmarking#

Runs complete model comparison studies across multiple forecasting targets.

Comparing forecasting models properly requires testing them on many different forecasting scenarios (equipment types, consumption/prosumption, solar/wind parks, regions, seasons). This module automates the entire process: training models, running backtests, calculating metrics, generating reports, and storing results for comparison.

The complete workflow:

Model training: Train different forecasting approaches on each target

Backtesting: Test all models under realistic conditions

Evaluation: Calculate performance metrics across different scenarios

Analysis: Generate comparison reports and visualizations

Storage: Save results for later analysis and sharing

Submodules#

`openstef_beam.benchmarking.baselines`	Benchmarks baselines used by the OpenSTEF Beam benchmarking utilities.
`openstef_beam.benchmarking.benchmark_comparison_pipeline`	Multi-run benchmark comparison and analysis pipeline.
`openstef_beam.benchmarking.benchmark_pipeline`	Benchmark pipeline for systematic forecasting model evaluation.
`openstef_beam.benchmarking.benchmarks`	Built in benchmarks to run with OpenSTEF BEAM.
`openstef_beam.benchmarking.callbacks`	Callback system for benchmark execution monitoring and event handling.
`openstef_beam.benchmarking.models`	Data models and types for benchmark targets and configurations.
`openstef_beam.benchmarking.storage`	Storage backends for benchmark results and analysis outputs.
`openstef_beam.benchmarking.target_provider`	Target provider interfaces and implementations for benchmark execution.

Functions#

read_evaluation_reports(targets, storage, ...)

Load evaluation reports for multiple targets from storage.

Classes#

`BenchmarkCallback`()	Base class for benchmark execution callbacks.
`BenchmarkCallbackManager`([callbacks])	Group of callbacks that can be used to aggregate multiple callbacks.
`BenchmarkComparisonPipeline`(analysis_config, ...)	Pipeline for comparing results across multiple benchmark runs.
`BenchmarkContext`(**data)	Context information passed to forecaster factories during benchmark execution.
`BenchmarkPipeline`(backtest_config, ...[, ...])	Orchestrates forecasting model benchmarks across multiple targets.
`BenchmarkStorage`()	Abstract base class for storing and retrieving benchmark results.
`BenchmarkTarget`(**data)	Base class for benchmark targets with common properties.
`InMemoryBenchmarkStorage`()	In-memory implementation of BenchmarkStorage for testing and temporary use.
`LocalBenchmarkStorage`(base_path, *[, ...])	File system-based storage implementation for benchmark results.
`S3BenchmarkStorage`(local_storage, bucket_name)	S3-backed storage implementation that combines local and cloud storage.
`SimpleTargetProvider`(**data)	File-based target provider loading from YAML configs and Parquet datasets.
`StrictExecutionCallback`()	Callback to ensure strict benchmark execution with immediate error termination.
`TargetProvider`(**data)	Abstract interface for loading benchmark targets and their associated datasets.
`TargetProviderConfig`(**data)	Configuration specifying data locations and path templates for target providers.