OutlierHandler#

class openstef_models.transforms.general.OutlierHandler(**data: Any) → None[source]

Bases: BaseConfig, TimeSeriesTransform

Transform that handles out-of-range values for selected features.

During fitting, this transform learns feature bounds from the training data. During transformation, values outside those bounds are either:

clipped to the learned bounds, or
replaced with NaN

The bound definition depends on the selected mode:

minmax: uses the observed minimum and maximum values
standard: uses mean ± n_std * std

Examples

>>> import pandas as pd
>>> from datetime import timedelta
>>> from openstef_core.datasets import TimeSeriesDataset
>>> from openstef_models.transforms.general.outlier_handler import OutlierHandler
>>> from openstef_models.utils.feature_selection import FeatureSelection
>>>
>>> training_data = pd.DataFrame({
...     "load": [100, 120, 110, 130, 125],
...     "temperature": [20, 22, 21, 23, 24],
... }, index=pd.date_range("2025-01-01", periods=5, freq="1h"))
>>> training_dataset = TimeSeriesDataset(training_data, timedelta(hours=1))
>>>
>>> test_data = pd.DataFrame({
...     "load": [90, 140, 115],
...     "temperature": [19, 25, 22],
... }, index=pd.date_range("2025-01-06", periods=3, freq="1h"))
>>> test_dataset = TimeSeriesDataset(test_data, timedelta(hours=1))
>>>
>>> # Default behavior: clip outliers
>>> handler = OutlierHandler(
...     selection=FeatureSelection(include=["load", "temperature"]),
...     mode="minmax",
... )
>>> handler.fit(training_dataset)
>>> transformed_dataset = handler.transform(test_dataset)
>>> transformed_dataset.data["load"].tolist()
[100, 130, 115]
>>> transformed_dataset.data["temperature"].tolist()
[20, 24, 22]
>>>
>>> # Optional behavior: replace outliers with NaN
>>> handler_nan = OutlierHandler(
...     selection=FeatureSelection(include=["load", "temperature"]),
...     mode="minmax",
...     outlier_action="nan",
... )
>>> handler_nan.fit(training_dataset)
>>> transformed_nan_dataset = handler_nan.transform(test_dataset)
>>> transformed_nan_dataset.data["load"].tolist()
[nan, nan, 115.0]
>>> transformed_nan_dataset.data["temperature"].tolist()
[nan, nan, 22.0]

Parameters:: data (Any)

selection: FeatureSelection

mode: ClipMode

outlier_action: OutlierAction

n_std: float

property is_fitted: bool: Check if the transform has been fitted.

fit(data: TimeSeriesDataset) → None[source]

Fit the transform to the input data.

This method should be called before applying the transform to the data. It allows the transform to learn any necessary parameters from the data.

Parameters:

data (TimeSeriesDataset) – The input data to fit the transform on.
data

Return type:

None

transform(data: TimeSeriesDataset) → TimeSeriesDataset[source]

Transform the input data.

This method should apply a transformation to the input data and return a new instance.

Parameters:

data (TimeSeriesDataset) – The input data to be transformed.
data

Returns:

A new instance of the transformed data.

Raises:

NotFittedError – If the transform has not been fitted yet.

Return type:

TimeSeriesDataset

features_added() → list[str][source]

List of feature names added by this transform.

Return type:: list[str]
Returns:: A list of strings representing the names of features added to the dataset by this transform. Default is an empty list.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': False, 'extra': 'ignore', 'protected_namespaces': ()}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) → None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

self (BaseModel) – The BaseModel instance.
context (Any) – The context.
self
context

Return type:

None