Evaluate Existing Forecasts#

Skip backtesting entirely — bring your own prediction parquets and run only evaluation + analysis.

User story: “I already have forecasts from my own system. I just want to score them with BEAM’s metrics and visualizations.”

See also:

Expected directory layout#

benchmark_results/MyForecasts/
└── backtest/
    └── <group_name>/           # e.g. "solar_park"
        └── <target_name>/      # e.g. "Within 15 kilometers of Opmeer_normalized"
            └── predictions.parquet

Expected parquet format#

Column

Type

Description

index

DatetimeIndex (name=”timestamp”, tz-naive UTC, 15-min)

Forecast timestamp

available_at

datetime

When the prediction was generated

quantile_P05

float

5th percentile

quantile_P50

float

Median (required)

quantile_P95

float

95th percentile

float

One column per quantile via Quantile(x).format()

Setup#

import logging
import multiprocessing
from pathlib import Path

from examples.benchmarks.custom.custom_benchmark import create_custom_benchmark_runner
from openstef_beam.backtesting.backtest_forecaster import DummyForecaster
from openstef_beam.benchmarking import BenchmarkContext, BenchmarkTarget, LocalBenchmarkStorage
from openstef_core.types import Q

_logger = logging.getLogger(__name__)

logging.basicConfig(level=logging.INFO, format="[%(asctime)s][%(levelname)s] %(message)s")

Configuration#

Point at the folder containing your prediction parquets and list the quantiles they were generated for.

# Path to the folder that contains the backtest/ directory with your parquets.
OUTPUT_PATH = Path("./benchmark_results/MyForecasts")
N_PROCESSES = multiprocessing.cpu_count()

# Quantiles your forecasts were generated for (must include 0.5 = median).
# Adjust this list to match whatever quantiles are in your parquet columns.
PREDICTION_QUANTILES = [Q(0.05), Q(0.1), Q(0.3), Q(0.5), Q(0.7), Q(0.9), Q(0.95)]

Dummy forecaster factory#

The pipeline still needs a factory to know which quantiles were used, but fit() and predict() are never called — backtesting is skipped.



def stub_factory(_context: BenchmarkContext, _target: BenchmarkTarget) -> DummyForecaster:
    """Factory that returns a DummyForecaster (backtesting is skipped).

    DummyForecaster provides quantile info to the pipeline but never runs
    fit() or predict() since backtest output already exists on disk.

    Returns:
        DummyForecaster with the configured quantiles.
    """
    return DummyForecaster(predict_quantiles=PREDICTION_QUANTILES)

Run evaluation#

The pipeline reads existing parquets and runs evaluation + analysis only.

if __name__ == "__main__":
    storage = LocalBenchmarkStorage(base_path=OUTPUT_PATH)

    runner = create_custom_benchmark_runner(storage=storage)

    runner.run(
        forecaster_factory=stub_factory,
        run_name="my_forecasts",
        n_processes=N_PROCESSES,
        filter_args=["solar_park"],
    )