EmptyFeatureRemover#
- class openstef_models.transforms.general.EmptyFeatureRemover(**data: Any) None[source]
Bases:
BaseConfig,TimeSeriesTransformTransform that removes columns which are completely empty (all values are missing).
This transform identifies columns that contain only missing values and removes them from the dataset. It respects both NaN values and custom missing value placeholders.
- Parameters:
columns – Set of column names to check for emptiness. If None, checks all columns.
missing_value (float) – The placeholder for missing values. Defaults to np.nan.
Example
>>> import pandas as pd >>> import numpy as np >>> from datetime import timedelta >>> from openstef_core.datasets import TimeSeriesDataset >>> from openstef_models.transforms.general import ( ... EmptyFeatureRemover, ... ) >>> from openstef_models.utils.feature_selection import FeatureSelection >>> # Create dataset with some empty columns >>> data = pd.DataFrame( ... { ... "radiation": [100.0, 110.0, 120.0], ... "temperature": [20.0, 21.0, 22.0], ... "empty_col1": [np.nan, np.nan, np.nan], ... "empty_col2": [np.nan, np.nan, np.nan], ... }, ... index=pd.date_range("2025-01-01", periods=3, freq="1h"), ... ) >>> dataset = TimeSeriesDataset(data, timedelta(hours=1)) >>> # Remove all empty columns >>> transform = EmptyFeatureRemover() >>> transform.fit(dataset) >>> result = transform.transform(dataset) >>> list(result.data.columns) ['radiation', 'temperature'] >>> # Only check specific columns >>> transform_selective = EmptyFeatureRemover( ... selection=FeatureSelection(include={"empty_col1", "radiation"}) ... ) >>> transform_selective.fit(dataset) >>> result_selective = transform_selective.transform(dataset) >>> "empty_col1" in result_selective.data.columns False >>> "empty_col2" in result_selective.data.columns # Not checked, so not removed True
- Parameters:
data (
Any)
-
selection:
FeatureSelection
-
missing_value:
float
- property is_fitted: bool
Check if the transform has been fitted.
- fit(data: TimeSeriesDataset) None[source]
Fit the transform to the input data.
This method should be called before applying the transform to the data. It allows the transform to learn any necessary parameters from the data.
- Parameters:
data (
TimeSeriesDataset) – The input data to fit the transform on.data
- Return type:
- transform(data: TimeSeriesDataset) TimeSeriesDataset[source]
Transform the input data.
This method should apply a transformation to the input data and return a new instance.
- Parameters:
data (
TimeSeriesDataset) – The input data to be transformed.data
- Returns:
A new instance of the transformed data.
- Raises:
NotFittedError – If the transform has not been fitted yet.
- Return type:
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': False, 'extra': 'ignore', 'protected_namespaces': ()}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].