EnsembleForecastDataset#

class openstef_core.datasets.validated_datasets.EnsembleForecastDataset(data: DataFrame, sample_interval: timedelta = timedelta(minutes=15), forecast_start: datetime | None = None, target_column: str = 'load', *, horizon_column: str = 'horizon', available_at_column: str = 'available_at') None[source]

Bases: TimeSeriesDataset

First stage output format for ensemble forecasters.

Parameters:
data: pd.DataFrame
horizon_column: str
available_at_column: str
__init__(data: DataFrame, sample_interval: timedelta = timedelta(minutes=15), forecast_start: datetime | None = None, target_column: str = 'load', *, horizon_column: str = 'horizon', available_at_column: str = 'available_at') None[source]

Initialize a time series dataset.

The dataset automatically detects whether it’s versioned based on column presence:

  • If horizon_column exists: versioned by forecast horizon

  • If available_at_column exists: versioned by availability time

  • Otherwise: regular time series

Parameters:
  • data (DataFrame) – DataFrame with DatetimeIndex containing the time series data.

  • sample_interval (timedelta) – Fixed interval between consecutive data points.

  • horizon_column (str) – Name of the column storing forecast horizons.

  • available_at_column (str) – Name of the column storing availability times.

  • is_sorted – Whether the data is sorted by timestamp.

  • check_frequency – Whether to check that the data frequency matches sample_interval.

  • data

  • sample_interval

  • forecast_start (datetime | None)

  • target_column (str)

  • horizon_column

  • available_at_column

Raises:
  • TypeError – If data index is not a pandas DatetimeIndex or if versioning columns have incorrect types.

  • ValueError – If data frequency does not match sample_interval.

forecast_start: datetime
target_column: str
forecaster_names: list[str]
quantiles: list[Quantile]
property target_series: Series | None

Return the target series if available.

static get_learner_and_quantile(feature_names: Index) tuple[list[str], list[Quantile]][source]

Extract base forecaster names and quantiles from feature names.

Column format is {learner}{ENSEMBLE_COLUMN_SEP}{quantile.format()}, e.g. lgbm__quantile_P50.

Parameters:
  • feature_names (Index) – Index of feature names in the dataset.

  • feature_names

Returns:

Tuple containing a list of base forecaster names and a list of quantiles.

Raises:

ValueError – If a column cannot be parsed or has an invalid quantile string.

Return type:

tuple[list[str], list[Quantile]]

classmethod from_forecast_datasets(datasets: dict[str, ForecastDataset], target_series: Series | None = None, sample_weights: Series | None = None) Self[source]

Create an EnsembleForecastDataset from multiple ForecastDatasets.

Parameters:
  • datasets (dict[str, ForecastDataset]) – Dict of ForecastDatasets to combine.

  • target_series (Series | None) – Optional target series to include in the dataset.

  • sample_weights (Series | None) – Optional sample weights series to include in the dataset.

  • datasets

  • target_series

  • sample_weights

Returns:

EnsembleForecastDataset combining all input datasets.

Return type:

Self

get_base_predictions_for_quantile(quantile: Quantile) ForecastInputDataset[source]

Get base forecaster predictions for a specific quantile.

Parameters:
  • quantile (Quantile) – Quantile to select.

  • quantile

Returns:

ForecastInputDataset containing predictions from all base forecasters at the specified quantile.

Return type:

ForecastInputDataset