DatasetMixin#
- class openstef_core.datasets.mixins.DatasetMixin(*args, **kwargs)[source]#
Bases:
ProtocolAbstract base class for dataset persistence operations.
This mixin defines the interface for saving and loading datasets to/from parquet files. It ensures datasets can be persisted with all their metadata and reconstructed exactly as they were saved.
Classes implementing this mixin must: - Save all data and metadata necessary for complete reconstruction - Store metadata in parquet file attributes using attrs - Handle missing metadata gracefully with sensible defaults when loading
See also
TimeSeriesDataset: Implementation for standard time series datasets. VersionedTimeSeriesDataset: Implementation for versioned dataset segments.
- abstractmethod to_parquet(path: Annotated[Path, PathType(path_type=file)]) None[source]#
Save the dataset to a parquet file.
Stores both the dataset’s data and all necessary metadata for complete reconstruction. Metadata should be stored in the parquet file’s attrs dictionary.
- Parameters:
path (Annotated[Path, PathType(path_type=file)]) – File path where the dataset should be saved.
See also
read_parquet: Counterpart method for loading datasets.
- Parameters:
path (
Path)- Return type:
None
- abstractmethod classmethod read_parquet(path: Annotated[Path, PathType(path_type=file)]) Self[source]#
Load a dataset from a parquet file.
Reconstructs a dataset from a parquet file created with to_parquet, including all data and metadata. Should handle missing metadata gracefully with sensible defaults.
- Parameters:
path (Annotated[Path, PathType(path_type=file)]) – Path to the parquet file to load.
- Returns:
New dataset instance reconstructed from the file.
- Return type:
Self
See also
to_parquet: Counterpart method for saving datasets.
- Parameters:
path (
Path)- Return type:
Self
- pipe(func: Callable[[Concatenate[Self, P]], T], *args: P, **kwargs: P) T[source]#
Applies a function to the dataset and returns the result.
This method allows for functional-style transformations and operations on the dataset, enabling method chaining and cleaner code.
- Parameters:
func (
Callable[[Concatenate[Self,ParamSpec(P)]],TypeVar(T)]) – A callable that takes the dataset instance and returns a value of type T.*args (P) – Positional arguments to pass to the function.
**kwargs (P) – Keyword arguments to pass to the function.
func
args (
ParamSpecArgs)kwargs (
ParamSpecKwargs)
- Returns:
The result of applying the function to the dataset.
- Return type:
TypeVar(T)
- __init__(*args, **kwargs)#