VersionedLagsAdder#

class openstef_models.transforms.time_domain.VersionedLagsAdder(**data: Any) None[source]

Bases: BaseConfig, VersionedTimeSeriesTransform

Create lag features while preserving data availability constraints.

This transform creates lagged versions of a column for capturing temporal dependencies in energy forecasting. Unlike traditional lag transforms, this preserves data availability constraints, ensuring lag features only use data that would have been available at prediction time.

Energy consumption has strong temporal patterns: yesterday’s peak predicts today’s, previous hours influence next hour’s demand, and energy use lags behind weather changes. In production forecasting, you cannot use future data to predict the present.

For each lag, the transform:

  • Shifts timestamps forward (e.g., -2h lag moves 10:00 data to 12:00)

  • Preserves availability constraints (data available at 15:00 stays available at 15:00)

  • Creates new feature columns (e.g., load becomes load_lag_-PT2H)

  • Maintains the versioned structure so multiple data versions are preserved independently

In versioned datasets with different availability times, this allows automatic selection of appropriate data versions:

  • Short lags + long lead times: Use high-quality data (available later)

  • Long lags + short lead times: Use lower-quality data (available sooner)

Example

Create lag features for energy forecasting

>>> from datetime import timedelta
>>> import pandas as pd
>>> from openstef_core.datasets import VersionedTimeSeriesDataset
>>> # Create sample energy data
>>> data = pd.DataFrame({
...     'available_at': pd.date_range('2025-01-01 10:00', periods=4, freq='h'),
...     'load': [100.0, 110.0, 120.0, 130.0]
... }, index=pd.date_range('2025-01-01 10:00', periods=4, freq='h'))
>>> dataset = VersionedTimeSeriesDataset.from_dataframe(data, timedelta(hours=1))
>>> # Create 1-hour and 2-hour lag features
>>> transform = VersionedLagsAdder(
...     feature='load',
...     lags=[timedelta(hours=-1), timedelta(hours=-2)]
... )
>>> result = transform.transform(dataset)
>>> snapshot = result.select_version()
>>> # Check lag feature names
>>> lag_features = [col for col in snapshot.feature_names if 'lag' in col]
>>> sorted(lag_features)
['load_lag_-PT1H', 'load_lag_-PT2H']
>>> # Verify lag values (1-hour lag shifts 100->11:00, 110->12:00, etc.)
>>> snapshot.data['load_lag_-PT1H'].dropna().tolist()
[100.0, 110.0, 120.0]

Note

Lag features are constrained to the original dataset’s time range. A dataset covering 10:00-13:00 with a -2h lag will have features available only from 12:00-13:00, not extending to 15:00. This prevents creating timepoints outside the forecasting range.

Parameters:

data (Any)

feature: str
lags: list[timedelta]
property is_fitted: bool

Check if the transform has been fitted.

fit(data: VersionedTimeSeriesDataset) None[source]

Fit the transform to the input data.

This method should be called before applying the transform to the data. It allows the transform to learn any necessary parameters from the data.

Parameters:
Return type:

None

transform(data: VersionedTimeSeriesDataset) VersionedTimeSeriesDataset[source]

Transform the input data.

This method should apply a transformation to the input data and return a new instance.

Parameters:
Returns:

A new instance of the transformed data.

Raises:

NotFittedError – If the transform has not been fitted yet.

Return type:

VersionedTimeSeriesDataset

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': False, 'extra': 'ignore', 'protected_namespaces': ()}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].