Compare Benchmark Runs#
Generate side-by-side comparison plots from multiple benchmark runs on the Liander 2024 dataset.
Prerequisites: Run at least two models first (e.g. XGBoost + GBLinear via the XGBoost & GBLinear notebook).
What this does:
Loads results from multiple model runs (each stored in its own directory)
Computes metrics across all targets using
BenchmarkComparisonPipelineProduces comparison visualizations (boxplots, ranking tables, per-target breakdowns)
Setup#
Point at the result directories from your benchmark runs.
from pathlib import Path
from openstef_beam.analysis.models import RunName
from openstef_beam.benchmarking import BenchmarkComparisonPipeline, LocalBenchmarkStorage
from openstef_beam.benchmarking.benchmarks import create_liander2024_benchmark_runner
from openstef_beam.benchmarking.benchmarks.liander2024 import LIANDER2024_ANALYSIS_CONFIG
from openstef_beam.benchmarking.storage import BenchmarkStorage
BASE_DIR = Path()
OUTPUT_PATH = BASE_DIR / "./benchmark_results_comparison"
BENCHMARK_DIR_GBLINEAR = BASE_DIR / "benchmark_results" / "GBLinear"
BENCHMARK_DIR_XGBOOST = BASE_DIR / "benchmark_results" / "XGBoost"
Load run results#
Each run is identified by a name and backed by a LocalBenchmarkStorage that
points at the directory where that model’s results were saved.
check_dirs = [
BENCHMARK_DIR_GBLINEAR,
BENCHMARK_DIR_XGBOOST,
]
for dir_path in check_dirs:
if not dir_path.exists():
msg = f"Benchmark directory not found: {dir_path}. Make sure to run the benchmarks first."
raise FileNotFoundError(msg)
run_storages: dict[RunName, BenchmarkStorage] = {
"gblinear": LocalBenchmarkStorage(base_path=BENCHMARK_DIR_GBLINEAR),
"xgboost": LocalBenchmarkStorage(base_path=BENCHMARK_DIR_XGBOOST),
}
Run comparison#
The pipeline loads predictions from each run, re-evaluates them with the Liander 2024 analysis config, and produces comparison visualizations.
target_provider = create_liander2024_benchmark_runner(
storage=LocalBenchmarkStorage(base_path=OUTPUT_PATH),
).target_provider
comparison_pipeline = BenchmarkComparisonPipeline(
analysis_config=LIANDER2024_ANALYSIS_CONFIG,
storage=LocalBenchmarkStorage(base_path=OUTPUT_PATH),
target_provider=target_provider,
)
comparison_pipeline.run(run_data=run_storages)