EvaluationPipeline#
- class openstef_beam.evaluation.EvaluationPipeline(config: EvaluationConfig, quantiles: list[Quantile], window_metric_providers: list[MetricProvider], global_metric_providers: list[MetricProvider]) None[source]#
Bases:
objectPipeline for evaluating probabilistic forecasting models.
Computes metrics across various dimensions: - Prediction availability times - Lead times - Time windows - Global and windowed metrics
Always includes observed probability as a calibration metric.
- Parameters:
config (
EvaluationConfig)quantiles (
list[Quantile])window_metric_providers (
list[MetricProvider])global_metric_providers (
list[MetricProvider])
- __init__(config: EvaluationConfig, quantiles: list[Quantile], window_metric_providers: list[MetricProvider], global_metric_providers: list[MetricProvider]) None[source]#
Initializes the pipeline with configuration and metric providers.
Automatically adds ObservedProbabilityProvider to global metrics to ensure calibration is always evaluated.
- Parameters:
config (
EvaluationConfig) – Configuration for evaluation pipeline with time dimensions.quantiles (
list[Quantile]) – List of quantiles to evaluate, must include 0.5 (median).window_metric_providers (
list[MetricProvider]) – Metrics to compute for time windows.global_metric_providers (
list[MetricProvider]) – Metrics to compute for entire dataset.config
quantiles
window_metric_providers
global_metric_providers
- Raises:
ValueError – If quantiles list does not include 0.5 (median quantile).
- run(predictions: TimeSeriesMixin, ground_truth: TimeSeriesMixin, target_column: str, evaluation_mask: DatetimeIndex | None = None) EvaluationReport[source]#
Evaluates predictions against ground truth.
Segments data by available_at and lead_time configurations, then computes metrics for each subset.
- Parameters:
predictions (
TimeSeriesMixin) – Forecasted values with versioning information.ground_truth (
TimeSeriesMixin) – Actual observed values for comparison.target_column (
str) – Name of the target column in ground truth dataset.evaluation_mask (
DatetimeIndex|None) – Optional datetime index to limit evaluation period.predictions
ground_truth
target_column
evaluation_mask
- Returns:
EvaluationReport containing metrics for each subset, organized by filtering criteria such as lead time windows and availability timestamps.
- Raises:
ValueError – If predictions and ground truth have different sample intervals.
MissingColumnsError – If any configured quantile columns are missing from predictions.
- Return type:
- run_for_subset(filtering: Filtering, predictions: ForecastDataset, evaluation_mask: DatetimeIndex | None = None) EvaluationSubsetReport[source]#
Evaluates a single evaluation subset.
Computes metrics for the provided subset without additional filtering.
- Parameters:
filtering (
TypeAliasType) – The filtering criteria describing this subset.predictions (
ForecastDataset) – TimeSeriesDataset containing the predicted values.evaluation_mask (
DatetimeIndex|None) – Optional datetime index to limit evaluation period.filtering
predictions
evaluation_mask
- Returns:
EvaluationSubsetReport containing computed metrics for the subset.
- Return type: