SampleWeighter#

class openstef_models.transforms.general.SampleWeighter(**data: Any) → None[source]

Bases: BaseConfig, TimeSeriesTransform

Transform that adds sample weights based on target variable distribution.

Supports two weighting methods:

exponential (default): Scales weights by target magnitude relative to a high percentile. Useful for emphasizing high-value samples like peak loads.
inverse_frequency: Weights samples inversely proportional to their frequency in the target distribution. Rare values receive higher weights.

Both methods clip weights to [weight_floor, 1.0] to ensure all samples contribute.

Invariants

Target column must exist in the dataset for weighting to be applied
Sample weights are always in the range [weight_floor, 1.0]
Rows with NaN target values receive default weight of 1.0
Transform is stateless and does not require fit()

Example

>>> from datetime import timedelta
>>> import pandas as pd
>>> from openstef_core.testing import create_timeseries_dataset
>>> from openstef_models.transforms.general import SampleWeighter
>>> dataset = create_timeseries_dataset(
...     index=pd.date_range("2025-01-01", periods=5, freq="1h"),
...     load=[10.0, 50.0, 100.0, 200.0, 150.0],
...     sample_interval=timedelta(hours=1),
... )
>>> transform = SampleWeighter()
>>> result = transform.fit_transform(dataset)
>>> result.data[["load", "sample_weight"]]
                      load  sample_weight
timestamp
2025-01-01 00:00:00   10.0       0.100000
2025-01-01 01:00:00   50.0       0.263158
2025-01-01 02:00:00  100.0       0.526316
2025-01-01 03:00:00  200.0       1.000000
2025-01-01 04:00:00  150.0       0.789474

Parameters:: data (Any)

config: SampleWeightConfig

target_column: str

sample_weight_column: str

normalize_target: bool

property is_fitted: bool: Check if the transform has been fitted.

fit(data: TimeSeriesDataset) → None[source]

Fit the transform to the input data.

This method should be called before applying the transform to the data. It allows the transform to learn any necessary parameters from the data.

Parameters:

data (TimeSeriesDataset) – The input data to fit the transform on.
data

Return type:

None

transform(data: TimeSeriesDataset) → TimeSeriesDataset[source]

Transform the input data.

This method should apply a transformation to the input data and return a new instance.

Parameters:

data (TimeSeriesDataset) – The input data to be transformed.
data

Returns:

A new instance of the transformed data.

Raises:

NotFittedError – If the transform has not been fitted yet.

Return type:

TimeSeriesDataset

features_added() → list[str][source]

List of feature names added by this transform.

Return type:: list[str]
Returns:: A list of strings representing the names of features added to the dataset by this transform. Default is an empty list.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': False, 'extra': 'ignore', 'protected_namespaces': ()}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

model_post_init(context: Any, /) → None

This function is meant to behave like a BaseModel method to initialize private attributes.

It takes context as an argument since that’s what pydantic-core passes when calling it.

Parameters:

self (BaseModel) – The BaseModel instance.
context (Any) – The context.
self
context

Return type:

None