EnsembleForecastDataset#

class openstef_core.datasets.EnsembleForecastDataset(data: DataFrame, sample_interval: timedelta = timedelta(minutes=15), forecast_start: datetime | None = None, target_column: str = 'load', *, horizon_column: str = 'horizon', available_at_column: str = 'available_at') None[source]#

Bases: TimeSeriesDataset

First stage output format for ensemble forecasters.

Parameters:
  • data (DataFrame)

  • sample_interval (timedelta)

  • forecast_start (datetime | None)

  • target_column (str)

  • horizon_column (str)

  • available_at_column (str)

__init__(data: DataFrame, sample_interval: timedelta = timedelta(minutes=15), forecast_start: datetime | None = None, target_column: str = 'load', *, horizon_column: str = 'horizon', available_at_column: str = 'available_at') None[source]#

Initialize a time series dataset.

The dataset automatically detects whether it’s versioned based on column presence: - If horizon_column exists: versioned by forecast horizon - If available_at_column exists: versioned by availability time - Otherwise: regular time series

Parameters:
  • data (DataFrame) – DataFrame with DatetimeIndex containing the time series data.

  • sample_interval (timedelta) – Fixed interval between consecutive data points.

  • horizon_column (str) – Name of the column storing forecast horizons.

  • available_at_column (str) – Name of the column storing availability times.

  • is_sorted – Whether the data is sorted by timestamp.

  • check_frequency – Whether to check that the data frequency matches sample_interval.

  • data

  • sample_interval

  • forecast_start (datetime | None)

  • target_column (str)

  • horizon_column

  • available_at_column

Raises:
  • TypeError – If data index is not a pandas DatetimeIndex or if versioning columns have incorrect types.

  • ValueError – If data frequency does not match sample_interval.

forecast_start: datetime#
target_column: str#
forecaster_names: list[str]#
quantiles: list[Quantile]#
property target_series: Series | None#

Return the target series if available.

static get_learner_and_quantile(feature_names: Index) tuple[list[str], list[Quantile]][source]#

Extract base forecaster names and quantiles from feature names.

Column format is {learner}{ENSEMBLE_COLUMN_SEP}{quantile.format()}, e.g. lgbm__quantile_P50.

Parameters:
  • feature_names (Index) – Index of feature names in the dataset.

  • feature_names

Returns:

Tuple containing a list of base forecaster names and a list of quantiles.

Raises:

ValueError – If a column cannot be parsed or has an invalid quantile string.

Return type:

tuple[list[str], list[Quantile]]

classmethod from_forecast_datasets(datasets: dict[str, ForecastDataset], target_series: Series | None = None, sample_weights: Series | None = None) Self[source]#

Create an EnsembleForecastDataset from multiple ForecastDatasets.

Parameters:
  • datasets (dict[str, ForecastDataset]) – Dict of ForecastDatasets to combine.

  • target_series (Series | None) – Optional target series to include in the dataset.

  • sample_weights (Series | None) – Optional sample weights series to include in the dataset.

  • datasets

  • target_series

  • sample_weights

Returns:

EnsembleForecastDataset combining all input datasets.

Return type:

Self

get_base_predictions_for_quantile(quantile: Quantile) ForecastInputDataset[source]#

Get base forecaster predictions for a specific quantile.

Parameters:
  • quantile (Quantile) – Quantile to select.

  • quantile

Returns:

ForecastInputDataset containing predictions from all base forecasters at the specified quantile.

Return type:

ForecastInputDataset