Evaluate Existing Forecasts#
Skip backtesting entirely — bring your own prediction parquets and run only evaluation + analysis.
User story: “I already have forecasts from my own system. I just want to score them with BEAM’s metrics and visualizations.”
See also:
BenchmarkPipeline — auto-detects existing predictions and skips backtesting
Custom Benchmark configuration — defines which targets and metrics to use
Quantile naming convention —
Quantile(x).format()→ column names
Expected directory layout#
benchmark_results/MyForecasts/
└── backtest/
└── <group_name>/ # e.g. "solar_park"
└── <target_name>/ # e.g. "Within 15 kilometers of Opmeer_normalized"
└── predictions.parquet
Expected parquet format#
Column |
Type |
Description |
|---|---|---|
index |
|
Forecast timestamp |
|
datetime |
When the prediction was generated |
|
float |
5th percentile |
|
float |
Median (required) |
|
float |
95th percentile |
… |
float |
One column per quantile via |
Setup#
import logging
import multiprocessing
from pathlib import Path
from examples.benchmarks.custom.custom_benchmark import create_custom_benchmark_runner
from openstef_beam.backtesting.backtest_forecaster import DummyForecaster
from openstef_beam.benchmarking import BenchmarkContext, BenchmarkTarget, LocalBenchmarkStorage
from openstef_core.types import Q
_logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO, format="[%(asctime)s][%(levelname)s] %(message)s")
Configuration#
Point at the folder containing your prediction parquets and list the quantiles they were generated for.
# Path to the folder that contains the backtest/ directory with your parquets.
OUTPUT_PATH = Path("./benchmark_results/MyForecasts")
N_PROCESSES = multiprocessing.cpu_count()
# Quantiles your forecasts were generated for (must include 0.5 = median).
# Adjust this list to match whatever quantiles are in your parquet columns.
PREDICTION_QUANTILES = [Q(0.05), Q(0.1), Q(0.3), Q(0.5), Q(0.7), Q(0.9), Q(0.95)]
Dummy forecaster factory#
The pipeline still needs a factory to know which quantiles were used, but
fit() and predict() are never called — backtesting is skipped.
def stub_factory(_context: BenchmarkContext, _target: BenchmarkTarget) -> DummyForecaster:
"""Factory that returns a DummyForecaster (backtesting is skipped).
DummyForecaster provides quantile info to the pipeline but never runs
fit() or predict() since backtest output already exists on disk.
Returns:
DummyForecaster with the configured quantiles.
"""
return DummyForecaster(predict_quantiles=PREDICTION_QUANTILES)
Run evaluation#
The pipeline reads existing parquets and runs evaluation + analysis only.
if __name__ == "__main__":
storage = LocalBenchmarkStorage(base_path=OUTPUT_PATH)
runner = create_custom_benchmark_runner(storage=storage)
runner.run(
forecaster_factory=stub_factory,
run_name="my_forecasts",
n_processes=N_PROCESSES,
filter_args=["solar_park"],
)