benchmarking#

Runs complete model comparison studies across multiple forecasting targets.

Comparing forecasting models properly requires testing them on many different forecasting scenarios (equipment types, consumption/prosumption, solar/wind parks, regions, seasons). This module automates the entire process: training models, running backtests, calculating metrics, generating reports, and storing results for comparison.

The complete workflow:

  • Model training: Train different forecasting approaches on each target

  • Backtesting: Test all models under realistic conditions

  • Evaluation: Calculate performance metrics across different scenarios

  • Analysis: Generate comparison reports and visualizations

  • Storage: Save results for later analysis and sharing

Submodules#

openstef_beam.benchmarking.baselines

Benchmarks baselines used by the OpenSTEF Beam benchmarking utilities.

openstef_beam.benchmarking.benchmark_comparison_pipeline

Multi-run benchmark comparison and analysis pipeline.

openstef_beam.benchmarking.benchmark_pipeline

Benchmark pipeline for systematic forecasting model evaluation.

openstef_beam.benchmarking.benchmarks

Built in benchmarks to run with OpenSTEF BEAM.

openstef_beam.benchmarking.callbacks

Callback system for benchmark execution monitoring and event handling.

openstef_beam.benchmarking.models

Data models and types for benchmark targets and configurations.

openstef_beam.benchmarking.storage

Storage backends for benchmark results and analysis outputs.

openstef_beam.benchmarking.target_provider

Target provider interfaces and implementations for benchmark execution.

Functions#

read_evaluation_reports(targets, storage, ...)

Load evaluation reports for multiple targets from storage.

Classes#

BenchmarkCallback()

Base class for benchmark execution callbacks.

BenchmarkCallbackManager([callbacks])

Group of callbacks that can be used to aggregate multiple callbacks.

BenchmarkComparisonPipeline(analysis_config, ...)

Pipeline for comparing results across multiple benchmark runs.

BenchmarkContext(**data)

Context information passed to forecaster factories during benchmark execution.

BenchmarkPipeline(backtest_config, ...[, ...])

Orchestrates forecasting model benchmarks across multiple targets.

BenchmarkStorage()

Abstract base class for storing and retrieving benchmark results.

BenchmarkTarget(**data)

Base class for benchmark targets with common properties.

InMemoryBenchmarkStorage()

In-memory implementation of BenchmarkStorage for testing and temporary use.

LocalBenchmarkStorage(base_path, *[, ...])

File system-based storage implementation for benchmark results.

S3BenchmarkStorage(local_storage, bucket_name)

S3-backed storage implementation that combines local and cloud storage.

SimpleTargetProvider(**data)

File-based target provider loading from YAML configs and Parquet datasets.

StrictExecutionCallback()

Callback to ensure strict benchmark execution with immediate error termination.

TargetProvider(**data)

Abstract interface for loading benchmark targets and their associated datasets.

TargetProviderConfig(**data)

Configuration specifying data locations and path templates for target providers.