TimeSeriesMixin#
- class openstef_core.datasets.mixins.TimeSeriesMixin(*args, **kwargs)[source]#
Bases:
ProtocolInterface defining the interface for time series datasets.
This interface defines the essential operations that all time series datasets must implement. It provides access to feature metadata, temporal properties, the dataset’s temporal index, and filtering/versioning capabilities.
- Classes implementing this interface must provide:
Access to the datetime index
Sample interval information
Feature names list
Versioning status indicator
Filtering methods for time ranges, availability, and lead times
Version selection for point-in-time data reconstruction
- abstract property index: DatetimeIndex#
Get the datetime index of the dataset.
- Returns:
DatetimeIndex representing all timestamps in the dataset.
- abstract property sample_interval: timedelta#
Get the fixed time interval between consecutive data points.
- Returns:
The sample interval as a timedelta.
- abstract property feature_names: list[str]#
Get the names of all available features in the dataset.
- Returns:
List of feature names, excluding metadata columns like timestamp, available_at, or horizon.
- abstract property is_versioned: bool#
Check if the dataset tracks data availability over time.
- Returns:
True if the dataset is versioned (tracks availability via horizon or available_at columns), False for regular time series.
- abstractmethod filter_by_range(start: datetime | None = None, end: datetime | None = None) Self[source]#
Filter the dataset to include only data within the specified time range.
- Parameters:
start (
datetime|None) – Inclusive start time of the range. If None, no start boundary applied.end (
datetime|None) – Exclusive end time of the range. If None, no end boundary applied.start
end
- Returns:
New instance containing only data within [start, end).
- Return type:
Self
- abstractmethod filter_by_available_before(available_before: datetime) Self[source]#
Filter to include only data available before the specified timestamp.
- Parameters:
available_before (
datetime) – Cutoff time for data availability.available_before
- Returns:
New instance containing only data available before the cutoff.
- Return type:
Self
- abstractmethod filter_by_available_at(available_at: AvailableAt) Self[source]#
Filter based on realistic daily data availability constraints.
- Parameters:
available_at (
AvailableAt) – Specification defining when data becomes available.available_at
- Returns:
New instance with data filtered by availability pattern.
- Return type:
Self
- abstractmethod filter_by_lead_time(lead_time: LeadTime) Self[source]#
Filter to include only data available at or longer than the specified lead time.
- Parameters:
lead_time (
LeadTime) – Minimum time gap required between data availability and timestamp.lead_time
- Returns:
New instance containing only data available with the required lead time.
- Return type:
Self
- abstractmethod select_version() TimeSeriesDataset[source]#
Select a specific version of the dataset based on data availability.
Creates a point-in-time snapshot by selecting the latest available version for each timestamp. Essential for preventing lookahead bias in backtesting.
- Return type:
- Returns:
TimeSeriesDataset containing the selected version of the data.
- calculate_time_coverage() timedelta[source]#
Calculates the total time span covered by the dataset.
This method computes the total duration represented by the dataset based on its unique timestamps and sample interval.
- Return type:
timedelta- Returns:
Total time coverage of the dataset.
- Return type:
timedelta
- __init__(*args, **kwargs)#