ForecastInputDataset#

class openstef_core.datasets.ForecastInputDataset(data: DataFrame, sample_interval: timedelta = timedelta(minutes=15), forecast_start: datetime | None = None, *, horizon_column: str = 'horizon', available_at_column: str = 'available_at', sample_weight_column: str = 'sample_weight', target_column: str = 'load') None[source]#

Bases: TimeSeriesDataset

Time series dataset for forecasting with validated target column.

Used for training and prediction data where a specific target column must exist. The target column represents the value being forecasted.

Invariants

  • Target column must exist in the dataset

  • Inherits all TimeSeriesDataset guarantees (sorted timestamps, consistent intervals)

Attrs:

target_column: Name of the target column to forecast. sample_weight_column: Name of the column containing sample weights. forecast_start: Optional timestamp indicating when the forecast period starts.

Example

>>> import pandas as pd
>>> from datetime import timedelta
>>> data = pd.DataFrame({
...     'load': [100, 120, 110],
...     'temperature': [20, 22, 21],
...     'weights': [1.0, 0.5, 1.0],
... }, index=pd.date_range('2025-01-01', periods=3, freq='h'))
>>> dataset = ForecastInputDataset(
...     data=data,
...     sample_interval=timedelta(hours=1),
...     target_column='load',
...     sample_weight_column='weights',
... )
>>> dataset.target_column
'load'
>>> dataset.sample_weight_column
'weights'
>>> len(dataset.target_series)
3
>>> len(dataset.sample_weight_series)
3

See also

TimeSeriesDataset: Base class for time series datasets. ForecastDataset: For storing probabilistic forecast results. TimeSeriesEnergyComponentDataset: For energy component analysis.

Parameters:
  • data (DataFrame)

  • sample_interval (timedelta)

  • forecast_start (datetime | None)

  • horizon_column (str)

  • available_at_column (str)

  • sample_weight_column (str)

  • target_column (str)

__init__(data: DataFrame, sample_interval: timedelta = timedelta(minutes=15), forecast_start: datetime | None = None, *, horizon_column: str = 'horizon', available_at_column: str = 'available_at', sample_weight_column: str = 'sample_weight', target_column: str = 'load') None[source]#

Initialize a time series dataset.

The dataset automatically detects whether it’s versioned based on column presence: - If horizon_column exists: versioned by forecast horizon - If available_at_column exists: versioned by availability time - Otherwise: regular time series

Parameters:
  • data (DataFrame) – DataFrame with DatetimeIndex containing the time series data.

  • sample_interval (timedelta) – Fixed interval between consecutive data points.

  • horizon_column (str) – Name of the column storing forecast horizons.

  • available_at_column (str) – Name of the column storing availability times.

  • is_sorted – Whether the data is sorted by timestamp.

  • data

  • sample_interval

  • forecast_start (datetime | None)

  • horizon_column

  • available_at_column

  • sample_weight_column (str)

  • target_column (str)

Raises:

TypeError – If data index is not a pandas DatetimeIndex or if versioning columns have incorrect types.

target_column: str#
sample_weight_column: str#
property forecast_start: datetime#

Get the forecast start timestamp.

Returns:

Datetime indicating when the forecast period starts.

property target_series: Series#

Extract the target time series from the dataset.

Returns:

Time series containing target values with original datetime index.

property sample_weight_series: Series#

Extract the sample weight time series from the dataset, if it exists.

Returns:

Time series containing sample weights with original datetime index, or None if the sample weight column does not exist.

input_data(start: datetime | None = None) DataFrame[source]#

Extract the input features excluding the target column.

Parameters:
  • start (datetime | None) – Optional datetime to filter data from. If provided, only includes data points with timestamps at or after this date.

  • start

Returns:

DataFrame containing input features with original datetime index.

Return type:

DataFrame

classmethod from_timeseries(dataset: TimeSeriesDataset, target_column: str = 'load', forecast_start: datetime | None = None) Self[source]#

Create ForecastInputDataset from a generic TimeSeriesDataset.

Parameters:
  • dataset (TimeSeriesDataset) – Input TimeSeriesDataset to convert.

  • target_column (str) – Name of the target column to forecast.

  • forecast_start (datetime | None) – Optional timestamp indicating forecast start.

  • dataset

  • target_column

  • forecast_start

Returns:

Instance of ForecastInputDataset with specified target column.

Return type:

Self

create_forecast_range(horizon: LeadTime) DatetimeIndex[source]#

Create forecast index for given horizon starting from forecast_start.

Parameters:
  • horizon (LeadTime) – Lead time horizon for the forecast.

  • horizon

Returns:

DatetimeIndex representing the forecast timestamps.

Return type:

DatetimeIndex

to_pandas() DataFrame[source]#

Convert the dataset to a pandas DataFrame with metadata stored in attrs.

Stores sample_interval, available_at_column, and horizon_column in the DataFrame’s attrs dictionary for later reconstruction.

Return type:

DataFrame

Returns:

DataFrame with dataset data and metadata in attrs.