BenchmarkStorage#

class openstef_beam.benchmarking.BenchmarkStorage[source]#

Bases: ABC

Abstract base class for storing and retrieving benchmark results.

Provides a unified interface for persisting all benchmark artifacts across different storage backends. The primary responsibility is ensuring consistent storage and retrieval of data, maintaining the temporal versioning semantics of forecasts.

Storage responsibilities: - Backtest outputs: Time series predictions with temporal versioning (forecasts

made at different times for the same target period, becoming more accurate closer to the actual time)

Evaluation reports: Performance metrics and analysis results
Analysis outputs: Visualizations and comparative analysis artifacts

Implementation requirements: - Consistent data storage and retrieval patterns - Preserve temporal versioning information in the stored data - Handle data organization schemes appropriate for the storage backend - Provide reliable error handling for missing or corrupted data

Example

Using storage in a benchmark pipeline:

>>> from openstef_beam.benchmarking.storage import LocalBenchmarkStorage
>>> from openstef_beam.benchmarking.storage.local_storage import LocalBenchmarkStorage
>>> from pathlib import Path
>>>
>>> # Configure storage backend (testable)
>>> storage = LocalBenchmarkStorage(
...     base_path=Path("./benchmark_results")
... )
>>>
>>> # Test storage creation
>>> isinstance(storage, LocalBenchmarkStorage)
True
>>> storage.base_path.name
'benchmark_results'

Integration with benchmark pipeline: # doctest: +SKIP

>>> from openstef_beam.benchmarking import BenchmarkPipeline
>>>
>>> # Use in complete benchmark setup
>>> pipeline = BenchmarkPipeline(
...     backtest_config=...,
...     evaluation_config=...,
...     analysis_config=...,
...     target_provider=...,
...     storage=storage  # Handles all result persistence
... )
>>>
>>> # Storage automatically manages:
>>> # - Backtest outputs (predictions with temporal versioning)
>>> # - Evaluation reports (metrics across time windows)
>>> # - Analysis visualizations (charts and summary tables)
>>> # pipeline.run(forecaster_factory=my_factory)

Custom storage implementation:

>>> class DatabaseStorage(BenchmarkStorage):
...     def __init__(self, db_connection):
...         self.db = db_connection
...
...     def save_backtest_output(self, target, output):
...         # Store forecast data preserving temporal versioning
...         self.db.save_predictions(
...             target_id=target.name,
...             predictions=output,  # Contains timestamp + available_at columns
...             metadata=target.metadata
...         )
...
...     def load_backtest_output(self, target):
...         # Retrieve data maintaining temporal versioning structure
...         return self.db.load_predictions(target_id=target.name)

The storage interface enables seamless switching between local development, cloud deployment, and custom enterprise systems while preserving the temporal nature of forecast data across all backends.

abstractmethod save_backtest_output(target: BenchmarkTarget, output: TimeSeriesDataset) → None[source]#

Save the backtest output for a specific benchmark target.

Stores the results of a backtest execution, associating it with the target configuration. Must handle overwrites of existing data gracefully.

Parameters:

target (BenchmarkTarget)
output (TimeSeriesDataset)

Return type:

None

abstractmethod load_backtest_output(target: BenchmarkTarget) → TimeSeriesDataset[source]#

Load previously saved backtest output for a benchmark target.

Returns:: The stored backtest results as a TimeSeriesDataset.
Raises:: KeyError – When no backtest output exists for the given target.
Parameters:: target (BenchmarkTarget)
Return type:: TimeSeriesDataset

abstractmethod has_backtest_output(target: BenchmarkTarget) → bool[source]#

Check if backtest output exists for the given benchmark target.

Returns:: True if backtest output is stored for the target, False otherwise.
Parameters:: target (BenchmarkTarget)
Return type:: bool

abstractmethod save_evaluation_output(target: BenchmarkTarget, output: EvaluationReport) → None[source]#

Save the evaluation report for a specific benchmark target.

Stores the evaluation metrics and analysis results, associating them with the target configuration. Must handle overwrites of existing data gracefully.

Parameters:

target (BenchmarkTarget)
output (EvaluationReport)

Return type:

None

abstractmethod load_evaluation_output(target: BenchmarkTarget) → EvaluationReport[source]#

Load previously saved evaluation report for a benchmark target.

Returns:: The stored evaluation report containing metrics and analysis results.
Raises:: KeyError – When no evaluation output exists for the given target.
Parameters:: target (BenchmarkTarget)
Return type:: EvaluationReport

abstractmethod has_evaluation_output(target: BenchmarkTarget) → bool[source]#

Check if evaluation output exists for the given benchmark target.

Returns:: True if evaluation output is stored for the target, False otherwise.
Parameters:: target (BenchmarkTarget)
Return type:: bool

abstractmethod save_analysis_output(output: AnalysisOutput) → None[source]#

Save analysis output, optionally associated with a benchmark target.

Parameters:

output (AnalysisOutput) – The analysis results to store, typically containing insights
output

Return type:

None

abstractmethod has_analysis_output(scope: AnalysisScope) → bool[source]#

Check if analysis output exists for the given target or global scope.

Parameters:

scope (AnalysisScope) – The scope of the analysis output to check.
scope

Returns:

True if analysis output exists for the specified scope, False otherwise.

Return type:

bool