TimeSeriesMixin#

class openstef_core.datasets.mixins.TimeSeriesMixin(*args, **kwargs)[source]#

Bases: Protocol

Interface defining the interface for time series datasets.

This interface defines the essential operations that all time series datasets must implement. It provides access to feature metadata, temporal properties, the dataset’s temporal index, and filtering/versioning capabilities.

Classes implementing this interface must provide:

Access to the datetime index
Sample interval information
Feature names list
Versioning status indicator
Filtering methods for time ranges, availability, and lead times
Version selection for point-in-time data reconstruction

abstract property index: DatetimeIndex#

Get the datetime index of the dataset.

Returns:: DatetimeIndex representing all timestamps in the dataset.

abstract property sample_interval: timedelta#

Get the fixed time interval between consecutive data points.

Returns:: The sample interval as a timedelta.

abstract property feature_names: list[str]#

Get the names of all available features in the dataset.

Returns:: List of feature names, excluding metadata columns like timestamp, available_at, or horizon.

abstract property is_versioned: bool#

Check if the dataset tracks data availability over time.

Returns:: True if the dataset is versioned (tracks availability via horizon or available_at columns), False for regular time series.

abstractmethod filter_by_range(start: datetime | None = None, end: datetime | None = None) → Self[source]#

Filter the dataset to include only data within the specified time range.

Parameters:

start (datetime | None) – Inclusive start time of the range. If None, no start boundary applied.
end (datetime | None) – Exclusive end time of the range. If None, no end boundary applied.
start
end

Returns:

New instance containing only data within [start, end).

Return type:

Self

abstractmethod filter_by_available_before(available_before: datetime) → Self[source]#

Filter to include only data available before the specified timestamp.

Parameters:

available_before (datetime) – Cutoff time for data availability.
available_before

Returns:

New instance containing only data available before the cutoff.

Return type:

Self

abstractmethod filter_by_available_at(available_at: AvailableAt) → Self[source]#

Filter based on realistic daily data availability constraints.

Parameters:

available_at (AvailableAt) – Specification defining when data becomes available.
available_at

Returns:

New instance with data filtered by availability pattern.

Return type:

Self

abstractmethod filter_by_lead_time(lead_time: LeadTime) → Self[source]#

Filter to include only data available at or longer than the specified lead time.

Parameters:

lead_time (LeadTime) – Minimum time gap required between data availability and timestamp.
lead_time

Returns:

New instance containing only data available with the required lead time.

Return type:

Self

abstractmethod select_version() → TimeSeriesDataset[source]#

Select a specific version of the dataset based on data availability.

Creates a point-in-time snapshot by selecting the latest available version for each timestamp. Essential for preventing lookahead bias in backtesting.

Return type:: TimeSeriesDataset
Returns:: TimeSeriesDataset containing the selected version of the data.

calculate_time_coverage() → timedelta[source]#

Calculates the total time span covered by the dataset.

This method computes the total duration represented by the dataset based on its unique timestamps and sample interval.

Return type:: timedelta
Returns:: Total time coverage of the dataset.
Return type:: timedelta

__init__(*args, **kwargs)#