Evaluate Existing Forecasts#

Skip backtesting entirely — bring your own prediction parquets and run only evaluation + analysis.

User story: “I already have forecasts from my own system. I just want to score them with BEAM’s metrics and visualizations.”

See also:

BenchmarkPipeline — auto-detects existing predictions and skips backtesting
Custom Benchmark configuration — defines which targets and metrics to use
Quantile naming convention — Quantile(x).format() → column names

Expected directory layout#

benchmark_results/MyForecasts/
└── backtest/
    └── <group_name>/           # e.g. "solar_park"
        └── <target_name>/      # e.g. "Within 15 kilometers of Opmeer_normalized"
            └── predictions.parquet

Expected parquet format#

Column	Type	Description
index	`DatetimeIndex` (name=”timestamp”, tz-naive UTC, 15-min)	Forecast timestamp
`available_at`	datetime	When the prediction was generated
`quantile_P05`	float	5th percentile
`quantile_P50`	float	Median (required)
`quantile_P95`	float	95th percentile
…	float	One column per quantile via `Quantile(x).format()`

Setup#

import logging
import multiprocessing
from pathlib import Path

from examples.benchmarks.custom.custom_benchmark import create_custom_benchmark_runner
from openstef_beam.backtesting.backtest_forecaster import DummyForecaster
from openstef_beam.benchmarking import BenchmarkContext, BenchmarkTarget, LocalBenchmarkStorage
from openstef_core.types import Q

_logger = logging.getLogger(__name__)

logging.basicConfig(level=logging.INFO, format="[%(asctime)s][%(levelname)s] %(message)s")

Configuration#

Point at the folder containing your prediction parquets and list the quantiles they were generated for.

# Path to the folder that contains the backtest/ directory with your parquets.
OUTPUT_PATH = Path("./benchmark_results/MyForecasts")
N_PROCESSES = multiprocessing.cpu_count()

# Quantiles your forecasts were generated for (must include 0.5 = median).
# Adjust this list to match whatever quantiles are in your parquet columns.
PREDICTION_QUANTILES = [Q(0.05), Q(0.1), Q(0.3), Q(0.5), Q(0.7), Q(0.9), Q(0.95)]

Dummy forecaster factory#

The pipeline still needs a factory to know which quantiles were used, but fit() and predict() are never called — backtesting is skipped.

def stub_factory(_context: BenchmarkContext, _target: BenchmarkTarget) -> DummyForecaster:
    """Factory that returns a DummyForecaster (backtesting is skipped).

    DummyForecaster provides quantile info to the pipeline but never runs
    fit() or predict() since backtest output already exists on disk.

    Returns:
        DummyForecaster with the configured quantiles.
    """
    return DummyForecaster(predict_quantiles=PREDICTION_QUANTILES)

Run evaluation#

The pipeline reads existing parquets and runs evaluation + analysis only.

if __name__ == "__main__":
    storage = LocalBenchmarkStorage(base_path=OUTPUT_PATH)

    runner = create_custom_benchmark_runner(storage=storage)

    runner.run(
        forecaster_factory=stub_factory,
        run_name="my_forecasts",
        n_processes=N_PROCESSES,
        filter_args=["solar_park"],
    )