DataSplitter#

class openstef_models.utils.data_split.DataSplitter(**data: Any) None[source]

Bases: BaseConfig

Handles splitting of time series data into train, validation, and test sets.

Supports stratified splitting to ensure representative data distribution across splits, particularly for extreme values in forecasting scenarios.

Parameters:

data (Any)

val_fraction: float
test_fraction: float
stratification_fraction: float
min_days_for_stratification: int
random_state: int
split_dataset(data: T, data_val: T | None = None, data_test: T | None = None, target_column: str = 'load') tuple[T, T | None, T | None][source]

Prepare and split input data into train, validation, and test sets.

Parameters:
Returns:

Tuple of (train_data, val_data, test_data) where val_data and test_data may be None.

Return type:

tuple[TypeVar(T, bound= TimeSeriesDataset), Optional[TypeVar(T, bound= TimeSeriesDataset)], Optional[TypeVar(T, bound= TimeSeriesDataset)]]

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': False, 'extra': 'ignore', 'protected_namespaces': ()}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].