DimensionalityReducer#

class openstef_models.transforms.general.dimensionality_reducer.DimensionalityReducer(**data: Any) → None[source]

Bases: BaseConfig, TimeSeriesTransform

Reduce the dimensionality of a given set of features.

Available methods include:

PCA: linear dimensionality reduction into orthogonal components.

Factor analysis: linear dimensionality reduction models observed variables as latent factors + Gaussian noise.

FastICA: linear dimensionality reduction that maximizes statistical independence among components.

KernelPCA: non-linear dimensionality reduction using rbf kernel.

Example

>>> import pandas as pd
>>> from datetime import timedelta
>>> from openstef_core.datasets import TimeSeriesDataset
>>> from openstef_models.transforms.general import DimensionalityReducer
>>> # Create sample dataset
>>> data = pd.DataFrame({
...     'load': [100, 120, 110, 130, 125],
...     'feature1': [1.0, 2.0, 1.5, 2.5, 2.0],
...     'feature2': [1.0, 2.0, 1.5, 2.5, 2.0],
...     'feature3': [5.0, 11.0, 8.0, 2.0, 11.0]
... }, index=pd.date_range('2025-01-01', periods=5, freq='1h'))
>>> dataset = TimeSeriesDataset(data, timedelta(hours=1))
>>> # Initialize and apply transform
>>> from openstef_models.utils.feature_selection import FeatureSelection
>>> dim_reducer = DimensionalityReducer(
...     selection=FeatureSelection(include={'feature1', 'feature2', 'feature3'}),
...     method="pca",
...     n_components=2,
...     random_state=1234
... )
>>> dim_reducer.fit(dataset)
>>> transformed_dataset = dim_reducer.transform(dataset)
>>> transformed_dataset.data.head().round(3)
                     component_1  component_2  load
timestamp
2025-01-01 00:00:00       -2.383       -1.166   100
2025-01-01 01:00:00        3.596        0.335   120
2025-01-01 02:00:00        0.606       -0.416   110
2025-01-01 03:00:00       -5.414        0.912   130
2025-01-01 04:00:00        3.596        0.335   125

Parameters:: data (Any)

selection: FeatureSelection

method: Literal['pca', 'factor_analysis', 'fastica', 'kernel_pca']

n_components: int

random_state: int | None

max_iter: int

property is_fitted: bool: Check if the transform has been fitted.

fit(data: TimeSeriesDataset) → None[source]

Fit the transform to the input data.

This method should be called before applying the transform to the data. It allows the transform to learn any necessary parameters from the data.

Parameters:

data (TimeSeriesDataset) – The input data to fit the transform on.
data

Return type:

None

transform(data: TimeSeriesDataset) → TimeSeriesDataset[source]

Transform the input data.

This method should apply a transformation to the input data and return a new instance.

Parameters:

data (TimeSeriesDataset) – The input data to be transformed.
data

Returns:

A new instance of the transformed data.

Raises:

NotFittedError – If the transform has not been fitted yet.

Return type:

TimeSeriesDataset

model_post_init(context: Any) → None[source]

Override this method to perform additional initialization after __init__ and model_construct. This is useful if you want to do some validation that requires the entire model to be initialized.

Parameters:: context (Any)
Return type:: None

features_added() → list[str][source]

List of feature names added by this transform.

Return type:: list[str]
Returns:: A list of strings representing the names of features added to the dataset by this transform. Default is an empty list.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': False, 'extra': 'ignore', 'protected_namespaces': ()}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].