DataSplitter#
- class openstef_models.utils.data_split.DataSplitter(**data: Any) None[source]
Bases:
BaseConfigHandles splitting of time series data into train, validation, and test sets.
Supports stratified splitting to ensure representative data distribution across splits, particularly for extreme values in forecasting scenarios.
- Parameters:
data (
Any)
-
val_fraction:
float
-
test_fraction:
float
-
stratification_fraction:
float
-
min_days_for_stratification:
int
-
random_state:
int
- split_dataset(data: T, data_val: T | None = None, data_test: T | None = None, target_column: str = 'load') tuple[T, T | None, T | None][source]
Prepare and split input data into train, validation, and test sets.
- Parameters:
data (
TypeVar(T, bound=TimeSeriesDataset)) – Full dataset to split.data_val (
Optional[TypeVar(T, bound=TimeSeriesDataset)]) – Optional pre-split validation data.data_test (
Optional[TypeVar(T, bound=TimeSeriesDataset)]) – Optional pre-split test data.target_column (
str) – Column name containing the target variable for stratification.data
data_val
data_test
target_column
- Returns:
Tuple of (train_data, val_data, test_data) where val_data and test_data may be None.
- Return type:
tuple[TypeVar(T, bound=TimeSeriesDataset),Optional[TypeVar(T, bound=TimeSeriesDataset)],Optional[TypeVar(T, bound=TimeSeriesDataset)]]
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': False, 'extra': 'ignore', 'protected_namespaces': ()}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].