DatasetMixin#

class openstef_core.datasets.mixins.DatasetMixin(*args, **kwargs)[source]#

Bases: Protocol

Abstract base class for dataset persistence operations.

This mixin defines the interface for saving and loading datasets to/from parquet files. It ensures datasets can be persisted with all their metadata and reconstructed exactly as they were saved.

Classes implementing this mixin must: - Save all data and metadata necessary for complete reconstruction - Store metadata in parquet file attributes using attrs - Handle missing metadata gracefully with sensible defaults when loading

See also

TimeSeriesDataset: Implementation for standard time series datasets. VersionedTimeSeriesDataset: Implementation for versioned dataset segments.

abstractmethod to_parquet(path: Annotated[Path, PathType(path_type=file)]) None[source]#

Save the dataset to a parquet file.

Stores both the dataset’s data and all necessary metadata for complete reconstruction. Metadata should be stored in the parquet file’s attrs dictionary.

Parameters:

path (Annotated[Path, PathType(path_type=file)]) – File path where the dataset should be saved.

See also

read_parquet: Counterpart method for loading datasets.

Parameters:

path (Path)

Return type:

None

abstractmethod classmethod read_parquet(path: Annotated[Path, PathType(path_type=file)]) Self[source]#

Load a dataset from a parquet file.

Reconstructs a dataset from a parquet file created with to_parquet, including all data and metadata. Should handle missing metadata gracefully with sensible defaults.

Parameters:

path (Annotated[Path, PathType(path_type=file)]) – Path to the parquet file to load.

Returns:

New dataset instance reconstructed from the file.

Return type:

Self

See also

to_parquet: Counterpart method for saving datasets.

Parameters:

path (Path)

Return type:

Self

pipe(func: Callable[[Concatenate[Self, P]], T], *args: P, **kwargs: P) T[source]#

Applies a function to the dataset and returns the result.

This method allows for functional-style transformations and operations on the dataset, enabling method chaining and cleaner code.

Parameters:
  • func (Callable[[Concatenate[Self, ParamSpec(P)]], TypeVar(T)]) – A callable that takes the dataset instance and returns a value of type T.

  • *args (P) – Positional arguments to pass to the function.

  • **kwargs (P) – Keyword arguments to pass to the function.

  • func

  • args (ParamSpecArgs)

  • kwargs (ParamSpecKwargs)

Returns:

The result of applying the function to the dataset.

Return type:

TypeVar(T)

__init__(*args, **kwargs)#