ForecastInputDataset#
- class openstef_core.datasets.ForecastInputDataset(data: DataFrame, sample_interval: timedelta = timedelta(minutes=15), forecast_start: datetime | None = None, *, horizon_column: str = 'horizon', available_at_column: str = 'available_at', sample_weight_column: str = 'sample_weight', target_column: str = 'load') None[source]#
Bases:
TimeSeriesDatasetTime series dataset for forecasting with validated target column.
Used for training and prediction data where a specific target column must exist. The target column represents the value being forecasted.
Invariants
Target column must exist in the dataset
Inherits all TimeSeriesDataset guarantees (sorted timestamps, consistent intervals)
- Attrs:
target_column: Name of the target column to forecast. sample_weight_column: Name of the column containing sample weights. forecast_start: Optional timestamp indicating when the forecast period starts.
Example
>>> import pandas as pd >>> from datetime import timedelta >>> data = pd.DataFrame({ ... 'load': [100, 120, 110], ... 'temperature': [20, 22, 21], ... 'weights': [1.0, 0.5, 1.0], ... }, index=pd.date_range('2025-01-01', periods=3, freq='h')) >>> dataset = ForecastInputDataset( ... data=data, ... sample_interval=timedelta(hours=1), ... target_column='load', ... sample_weight_column='weights', ... ) >>> dataset.target_column 'load' >>> dataset.sample_weight_column 'weights' >>> len(dataset.target_series) 3 >>> len(dataset.sample_weight_series) 3
See also
TimeSeriesDataset: Base class for time series datasets. ForecastDataset: For storing probabilistic forecast results. TimeSeriesEnergyComponentDataset: For energy component analysis.
- Parameters:
data (
DataFrame)sample_interval (
timedelta)forecast_start (
datetime|None)horizon_column (
str)available_at_column (
str)sample_weight_column (
str)target_column (
str)
- __init__(data: DataFrame, sample_interval: timedelta = timedelta(minutes=15), forecast_start: datetime | None = None, *, horizon_column: str = 'horizon', available_at_column: str = 'available_at', sample_weight_column: str = 'sample_weight', target_column: str = 'load') None[source]#
Initialize a time series dataset.
The dataset automatically detects whether it’s versioned based on column presence: - If horizon_column exists: versioned by forecast horizon - If available_at_column exists: versioned by availability time - Otherwise: regular time series
- Parameters:
data (
DataFrame) – DataFrame with DatetimeIndex containing the time series data.sample_interval (
timedelta) – Fixed interval between consecutive data points.horizon_column (
str) – Name of the column storing forecast horizons.available_at_column (
str) – Name of the column storing availability times.is_sorted – Whether the data is sorted by timestamp.
data
sample_interval
forecast_start (
datetime|None)horizon_column
available_at_column
sample_weight_column (
str)target_column (
str)
- Raises:
TypeError – If data index is not a pandas DatetimeIndex or if versioning columns have incorrect types.
-
target_column:
str#
-
sample_weight_column:
str#
- property forecast_start: datetime#
Get the forecast start timestamp.
- Returns:
Datetime indicating when the forecast period starts.
- property target_series: Series#
Extract the target time series from the dataset.
- Returns:
Time series containing target values with original datetime index.
- property sample_weight_series: Series#
Extract the sample weight time series from the dataset, if it exists.
- Returns:
Time series containing sample weights with original datetime index, or None if the sample weight column does not exist.
- input_data(start: datetime | None = None) DataFrame[source]#
Extract the input features excluding the target column.
- Parameters:
start (
datetime|None) – Optional datetime to filter data from. If provided, only includes data points with timestamps at or after this date.start
- Returns:
DataFrame containing input features with original datetime index.
- Return type:
DataFrame
- classmethod from_timeseries(dataset: TimeSeriesDataset, target_column: str = 'load', forecast_start: datetime | None = None) Self[source]#
Create ForecastInputDataset from a generic TimeSeriesDataset.
- Parameters:
dataset (
TimeSeriesDataset) – Input TimeSeriesDataset to convert.target_column (
str) – Name of the target column to forecast.forecast_start (
datetime|None) – Optional timestamp indicating forecast start.dataset
target_column
forecast_start
- Returns:
Instance of ForecastInputDataset with specified target column.
- Return type:
Self
- create_forecast_range(horizon: LeadTime) DatetimeIndex[source]#
Create forecast index for given horizon starting from forecast_start.
- Parameters:
horizon (
LeadTime) – Lead time horizon for the forecast.horizon
- Returns:
DatetimeIndex representing the forecast timestamps.
- Return type:
DatetimeIndex
- to_pandas() DataFrame[source]#
Convert the dataset to a pandas DataFrame with metadata stored in attrs.
Stores sample_interval, available_at_column, and horizon_column in the DataFrame’s attrs dictionary for later reconstruction.
- Return type:
DataFrame- Returns:
DataFrame with dataset data and metadata in attrs.