Custom Benchmark Templates#
Copy this folder as a starting point for your own BEAM benchmarks.
Which file do I start with?#
I want to… |
Start here |
|---|---|
Benchmark my own model |
|
Benchmark on my own data |
|
Score predictions I already have |
|
Files#
File |
Role |
|---|---|
|
Template: your model. Implements the |
|
Template: your benchmark. Defines where data lives, which metrics to use, and assembles the pipeline. |
|
Entry point: test your forecaster on the built-in Liander 2024 dataset (auto-downloaded). |
|
Entry point: run your forecaster on your own data (uses |
|
Entry point: bring your own prediction parquets, skip backtesting. |
|
Entry point: compare results from multiple runs side-by-side. |
Quick start#
# Install (requires uv: https://docs.astral.sh/uv/)
uv sync --all-extras --all-groups --all-packages
# Test the example forecaster on Liander 2024
uv run python -m examples.benchmarks.custom.run_liander2024_benchmark
# Run with your custom data/targets
uv run python -m examples.benchmarks.custom.run_custom_benchmark
Creating your own#
1. Write a forecaster#
Copy custom_forecaster.py and implement two methods:
fit(data)— called periodically with recent history. Train your model here.predict(data)— called every few hours. Return aTimeSeriesDatasetwith a"load"column and one column per quantile (e.g."quantile_P05","quantile_P50").
The data argument is a RestrictedHorizonVersionedTimeSeries — it enforces no-lookahead by only exposing data available at data.horizon.
2. Define a benchmark (optional)#
Copy custom_benchmark.py if you want to use your own data. Override _get_measurements_path_for_target() and _get_weather_path_for_target() to point to your parquet files.
If you’re fine with the Liander 2024 dataset, skip this step and use create_liander2024_benchmark_runner() directly.
3. Run it#
Copy run_custom_benchmark.py. Register your models as forecaster factories and call pipeline.run().
Evaluating pre-existing forecasts#
If you already have predictions, place them in this layout:
benchmark_results/MyForecasts/
└── backtest/
└── <group_name>/ # e.g. "solar_park"
└── <target_name>/ # e.g. "Within 15 kilometers of Opmeer_normalized"
└── predictions.parquet
group_name and target_name must match the values from your targets YAML. You can list them:
uv run python -c "
from examples.benchmarks.custom.custom_benchmark import create_custom_benchmark_runner
from openstef_beam.benchmarking import LocalBenchmarkStorage
from pathlib import Path
runner = create_custom_benchmark_runner(storage=LocalBenchmarkStorage(base_path=Path('./tmp')))
for t in runner.target_provider.get_targets(['solar_park']):
print(t.group_name, '/', t.name)
"
Each predictions.parquet must have:
Column |
Type |
Description |
|---|---|---|
(index) |
|
When each prediction is valid for. 15-min intervals, tz-naive UTC. |
|
|
When the prediction was generated (enables D-1 / lead-time filtering). |
|
|
5th percentile prediction. |
|
|
Median prediction (required). |
|
|
95th percentile prediction. |
… |
|
One column per quantile, named with |
Example rows:
timestamp (index) available_at quantile_P05 quantile_P50 quantile_P95
2023-01-15 12:00:00 2023-01-14 06:00:00 0.5 1.2 2.0
2023-01-15 12:15:00 2023-01-14 06:00:00 0.6 1.3 2.1
Then run:
uv run python -m examples.benchmarks.custom.evaluate_existing_forecasts
Results are written to ./benchmark_results/. Each model gets its own subfolder with backtest predictions, evaluation scores, and analysis plots.
Comparing results#
After running at least two models, generate side-by-side comparison plots (global, per-group, per-target). The scripts automatically detect which targets are available in all runs:
uv run python -m examples.benchmarks.custom.compare_benchmark_runs
Output (HTML plots) is saved to ./benchmark_results_comparison/.