openstef.model.metamodels package

Submodules

openstef.model.metamodels.feature_clipper module

class openstef.model.metamodels.feature_clipper.FeatureClipper(columns)

Bases: BaseEstimator, TransformerMixin

A transformer that clips the values of specified columns to the minimum and maximum values observed during training. This prevents the model from extrapolating beyond these values during prediction.

fit(X, y=None)

Fits the transformer on the training data by calculating the min and max values for the specified columns.

Parameters:

Xpd.DataFrame

The input DataFrame containing training data.

yOptional[pd.Series]

Ignored. This parameter exists for compatibility with scikit-learn’s pipeline.

Returns:

selfFeatureClipper

Fitted transformer.

Raises:

ValueError:

If the input is not a pandas DataFrame.

rtype:

FeatureClipper

transform(X)

Transforms new data by clipping the specified columns’ values to be within the min and max range observed during fitting.

Parameters:

Xpd.DataFrame

The input DataFrame containing new data to be transformed.

Returns:

X_pd.DataFrame

A copy of the input DataFrame with clipped values in the specified columns.

Raises:

ValueError:

If the input is not a pandas DataFrame.

rtype:

DataFrame

openstef.model.metamodels.grouped_regressor module

This module defines the grouped regressor.

class openstef.model.metamodels.grouped_regressor.GroupedRegressor(base_estimator, group_columns, n_jobs=1)

Bases: BaseEstimator, RegressorMixin, MetaEstimatorMixin

Meta-model that trains an instance of the base estimator for each key of a groupby operation applied on the data.

The base estimator is a sklearn regressor, the groupby is performed on the columns specified in parameters. Moreover fit and predict methods can be performed in parallel for each group key thanks to joblib.

Example:

data =  | index | group | x0 | x1 | x3 | y |
        |   0   |   1   | .. | .. | .. | . |
        |   1   |   2   | .. | .. | .. | . |
        |   2   |   1   | .. | .. | .. | . |
        |   3   |   2   | .. | .. | .. | . |

        [              X              ][ Y ]
The GroupedRegressor on the data with the group_columns=’group’ fits 2 models:
  • The model 1 with the row 0 and 2, columns x0, x1 and x3 as the features and column y as the target.

  • The model 2 with the row 1 and 3, columns x0, x1 and x3 as the features and column y as the target.

Parameters:
  • base_estimator (RegressorMixin) – Regressor .

  • group_columns (Union[str, int, list[str], list[int]]) – Name(s) of the column(s) used as the key for groupby operation.

  • n_jobs (int) – default=1 The maximum number of concurrently running jobs, such as the number of Python worker processes when backend=”multiprocessing” or the size of the thread-pool when backend=”threading

feature_names_

All input feature (without group_columns).

estimators_

Dictionnary that stocks fitted estimators for each group. The keys are the keys of grouping and the values are the regressors fitted on the grouped data.

fit(x, y, eval_set=None, **kwargs)

Fit the model.

classmethod grouped_compute(df, group_columns, func, n_jobs=1, eval_set=None)

Computes the specified function on each group defined by the grouping columns.

It is an utility function used to perform fit and predict on each group. The df_res is the final dataframe that aggregate the results for each group. The group_res is a tuple where each field is corresponding to a results for a group. The gb is the grouping object.

Parameters:
  • df (DataFrame) – DataFrame containing the input data necessary for the computation .

  • group_columns (Union[list[str], list[int]]) – List of the columns used for the groupby operation

  • func (Callable[[tuple, DataFrame], array]) – Function that take the group key and the conrresponding data of this group and perform the computation on this group.

  • n_jobs (int) – The maximum number of concurrently running jobs,

Return type:

tuple[tuple[array, ...], DataFrameGroupBy, DataFrame]

Returns:

The tuple of the results of each group, the grouping dataframe and the global dataframe of results.

predict(x, **kwargs)

Make a predicion.

Return type:

ndarray

set_fit_request(*, eval_set: bool | None | str = '$UNCHANGED$', x: bool | None | str = '$UNCHANGED$') GroupedRegressor

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for eval_set parameter in fit.

  • x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, x: bool | None | str = '$UNCHANGED$') GroupedRegressor

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in predict.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') GroupedRegressor

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

openstef.model.metamodels.missing_values_handler module

This module defines the missing value handler.

class openstef.model.metamodels.missing_values_handler.MissingValuesHandler(base_estimator, missing_values=nan, imputation_strategy=None, fill_value=None)

Bases: BaseEstimator, RegressorMixin, MetaEstimatorMixin

Class for a meta-models that handles missing values and removes columns filled exclusively by NaN.

It’s a pipeline of:

  • An Imputation transformer for completing missing values.

  • A Regressor fitted on the filled data.

Parameters:
  • base_estimator (RegressorMixin) – Regressor used in the pipeline.

  • missing_values (Union[int, float, str, None]) – The placeholder for the missing values. All occurrences of missing_values will be imputed. For pandas’ dataframes with nullable integer dtypes with missing values, missing_values should be set to np.nan, since pd.NA will be converted to np.nan.

  • imputation_strategy (str) – The imputation strategy. - If None no imputation is performed. - If “mean”, then replace missing values using the mean along each column. Can only be used with numeric data. - If “median”, then replace missing values using the median along each column. Can only be used with numeric data. - If “most_frequent”, then replace missing using the most frequent value along each column. Can be used with strings or numeric data. If there is more than one such value, only the smallest is returned. - If “constant”, then replace missing values with fill_value. Can be used with strings or numeric data.

  • fill_value (Union[str, int, float]) – When strategy == “constant”, fill_value is used to replace all occurrences of missing_values. If left to the default, fill_value will be 0 when imputing numerical data and “missing_value” for strings or object data types.

feature_names

All input feature.

non_null_columns_

Valid features used by the regressor.

n_features_in_

Number of input features.

regressor_

RegressorMixin Regressor fitted on valid columns.

imputer_

SimpleImputer Imputer for missig value fitted on valid columns.

pipeline_

Pipeline Pipeline that chains the imputer and the regressor.

feature_importances_

ndarray (n_features_in_, ) The feature importances from the regressor for valid features and zero otherwise.

fit(x, y)

Fit model.

predict(x)

Make a prediction.

set_fit_request(*, x: bool | None | str = '$UNCHANGED$') MissingValuesHandler

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_predict_request(*, x: bool | None | str = '$UNCHANGED$') MissingValuesHandler

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for x parameter in predict.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') MissingValuesHandler

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

Module contents