EnergyComponentDataset#

class openstef_core.datasets.EnergyComponentDataset(data: DataFrame, sample_interval: timedelta = timedelta(minutes=15), *, horizon_column: str = 'horizon', available_at_column: str = 'available_at') None[source]#

Bases: TimeSeriesDataset

Time series dataset for energy generation by component type.

Validates that all required energy component columns (wind, solar, other) are present. Used for energy sector analysis and component-specific forecasting.

Invariants

  • Must contain columns for all energy component types

  • Inherits all TimeSeriesDataset guarantees (sorted timestamps, consistent intervals)

Example

>>> import pandas as pd
>>> from datetime import timedelta
>>> energy_data = pd.DataFrame({
...     'wind': [50, 60],
...     'solar': [30, 40],
...     'other': [20, 25]
... }, index=pd.date_range('2025-01-01', periods=2, freq='h'))
>>> dataset = EnergyComponentDataset(energy_data, timedelta(hours=1))
>>> 'wind' in dataset.feature_names
True
>>> len(dataset.feature_names)
3

See also

TimeSeriesDataset: Base class for time series datasets. ForecastInputDataset: For general forecasting input data. EnergyComponentType: Enum defining required energy component types.

Parameters:
  • data (DataFrame)

  • sample_interval (timedelta)

  • horizon_column (str)

  • available_at_column (str)

__init__(data: DataFrame, sample_interval: timedelta = timedelta(minutes=15), *, horizon_column: str = 'horizon', available_at_column: str = 'available_at') None[source]#

Initialize a time series dataset.

The dataset automatically detects whether it’s versioned based on column presence: - If horizon_column exists: versioned by forecast horizon - If available_at_column exists: versioned by availability time - Otherwise: regular time series

Parameters:
  • data (DataFrame) – DataFrame with DatetimeIndex containing the time series data.

  • sample_interval (timedelta) – Fixed interval between consecutive data points.

  • horizon_column (str) – Name of the column storing forecast horizons.

  • available_at_column (str) – Name of the column storing availability times.

  • is_sorted – Whether the data is sorted by timestamp.

  • data

  • sample_interval

  • horizon_column

  • available_at_column

Raises:

TypeError – If data index is not a pandas DatetimeIndex or if versioning columns have incorrect types.