openstef.model.metamodels package¶
Submodules¶
openstef.model.metamodels.feature_clipper module¶
- class openstef.model.metamodels.feature_clipper.FeatureClipper(columns)¶
Bases:
BaseEstimator
,TransformerMixin
A transformer that clips the values of specified columns to the minimum and maximum values observed during training. This prevents the model from extrapolating beyond these values during prediction.
- fit(X, y=None)¶
Fits the transformer on the training data by calculating the min and max values for the specified columns.
Parameters:¶
- Xpd.DataFrame
The input DataFrame containing training data.
- yOptional[pd.Series]
Ignored. This parameter exists for compatibility with scikit-learn’s pipeline.
Returns:¶
- selfFeatureClipper
Fitted transformer.
Raises:¶
- ValueError:
If the input is not a pandas DataFrame.
- rtype:
- transform(X)¶
Transforms new data by clipping the specified columns’ values to be within the min and max range observed during fitting.
Parameters:¶
- Xpd.DataFrame
The input DataFrame containing new data to be transformed.
Returns:¶
- X_pd.DataFrame
A copy of the input DataFrame with clipped values in the specified columns.
Raises:¶
- ValueError:
If the input is not a pandas DataFrame.
- rtype:
DataFrame
openstef.model.metamodels.grouped_regressor module¶
This module defines the grouped regressor.
- class openstef.model.metamodels.grouped_regressor.GroupedRegressor(base_estimator, group_columns, n_jobs=1)¶
Bases:
BaseEstimator
,RegressorMixin
,MetaEstimatorMixin
Meta-model that trains an instance of the base estimator for each key of a groupby operation applied on the data.
The base estimator is a sklearn regressor, the groupby is performed on the columns specified in parameters. Moreover fit and predict methods can be performed in parallel for each group key thanks to joblib.
Example:
data = | index | group | x0 | x1 | x3 | y | | 0 | 1 | .. | .. | .. | . | | 1 | 2 | .. | .. | .. | . | | 2 | 1 | .. | .. | .. | . | | 3 | 2 | .. | .. | .. | . | [ X ][ Y ]
- The GroupedRegressor on the data with the group_columns=’group’ fits 2 models:
The model 1 with the row 0 and 2, columns x0, x1 and x3 as the features and column y as the target.
The model 2 with the row 1 and 3, columns x0, x1 and x3 as the features and column y as the target.
- Parameters:
base_estimator (
RegressorMixin
) – Regressor .group_columns (
Union
[str
,int
,list
[str
],list
[int
]]) – Name(s) of the column(s) used as the key for groupby operation.n_jobs (
int
) – default=1 The maximum number of concurrently running jobs, such as the number of Python worker processes when backend=”multiprocessing” or the size of the thread-pool when backend=”threading
- feature_names_¶
All input feature (without group_columns).
- estimators_¶
Dictionnary that stocks fitted estimators for each group. The keys are the keys of grouping and the values are the regressors fitted on the grouped data.
- fit(x, y, eval_set=None, **kwargs)¶
Fit the model.
- classmethod grouped_compute(df, group_columns, func, n_jobs=1, eval_set=None)¶
Computes the specified function on each group defined by the grouping columns.
It is an utility function used to perform fit and predict on each group. The df_res is the final dataframe that aggregate the results for each group. The group_res is a tuple where each field is corresponding to a results for a group. The gb is the grouping object.
- Parameters:
df (
DataFrame
) – DataFrame containing the input data necessary for the computation .group_columns (
Union
[list
[str
],list
[int
]]) – List of the columns used for the groupby operationfunc (
Callable
[[tuple
,DataFrame
],array
]) – Function that take the group key and the conrresponding data of this group and perform the computation on this group.n_jobs (
int
) – The maximum number of concurrently running jobs,
- Return type:
tuple
[tuple
[array
,...
],DataFrameGroupBy
,DataFrame
]- Returns:
The tuple of the results of each group, the grouping dataframe and the global dataframe of results.
- predict(x, **kwargs)¶
Make a predicion.
- Return type:
ndarray
- set_fit_request(*, eval_set: bool | None | str = '$UNCHANGED$', x: bool | None | str = '$UNCHANGED$') GroupedRegressor ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
eval_set (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
eval_set
parameter infit
.x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
x
parameter infit
.
- Returns:
self – The updated object.
- Return type:
object
- set_predict_request(*, x: bool | None | str = '$UNCHANGED$') GroupedRegressor ¶
Request metadata passed to the
predict
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
x
parameter inpredict
.- Returns:
self – The updated object.
- Return type:
object
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') GroupedRegressor ¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.- Returns:
self – The updated object.
- Return type:
object
openstef.model.metamodels.missing_values_handler module¶
This module defines the missing value handler.
- class openstef.model.metamodels.missing_values_handler.MissingValuesHandler(base_estimator, missing_values=nan, imputation_strategy=None, fill_value=None)¶
Bases:
BaseEstimator
,RegressorMixin
,MetaEstimatorMixin
Class for a meta-models that handles missing values and removes columns filled exclusively by NaN.
It’s a pipeline of:
An Imputation transformer for completing missing values.
A Regressor fitted on the filled data.
- Parameters:
base_estimator (
RegressorMixin
) – Regressor used in the pipeline.missing_values (
Union
[int
,float
,str
,None
]) – The placeholder for the missing values. All occurrences of missing_values will be imputed. For pandas’ dataframes with nullable integer dtypes with missing values, missing_values should be set to np.nan, since pd.NA will be converted to np.nan.imputation_strategy (
str
) – The imputation strategy. - If None no imputation is performed. - If “mean”, then replace missing values using the mean along each column. Can only be used with numeric data. - If “median”, then replace missing values using the median along each column. Can only be used with numeric data. - If “most_frequent”, then replace missing using the most frequent value along each column. Can be used with strings or numeric data. If there is more than one such value, only the smallest is returned. - If “constant”, then replace missing values with fill_value. Can be used with strings or numeric data.fill_value (
Union
[str
,int
,float
]) – When strategy == “constant”, fill_value is used to replace all occurrences of missing_values. If left to the default, fill_value will be 0 when imputing numerical data and “missing_value” for strings or object data types.
- feature_names¶
All input feature.
- non_null_columns_¶
Valid features used by the regressor.
- n_features_in_¶
Number of input features.
- regressor_¶
RegressorMixin Regressor fitted on valid columns.
- imputer_¶
SimpleImputer Imputer for missig value fitted on valid columns.
- pipeline_¶
Pipeline Pipeline that chains the imputer and the regressor.
- feature_importances_¶
ndarray (n_features_in_, ) The feature importances from the regressor for valid features and zero otherwise.
- fit(x, y)¶
Fit model.
- predict(x)¶
Make a prediction.
- set_fit_request(*, x: bool | None | str = '$UNCHANGED$') MissingValuesHandler ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
x
parameter infit
.- Returns:
self – The updated object.
- Return type:
object
- set_predict_request(*, x: bool | None | str = '$UNCHANGED$') MissingValuesHandler ¶
Request metadata passed to the
predict
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed topredict
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it topredict
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
x (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
x
parameter inpredict
.- Returns:
self – The updated object.
- Return type:
object
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') MissingValuesHandler ¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.- Returns:
self – The updated object.
- Return type:
object