[scikit-learn] Imblearn: SMOTENC

S Hamidizade hamidizade.s at gmail.com
Thu Jan 24 01:09:55 EST 2019


Dear Mr. Lemaitre

Thanks a lot for sharing your time and knowledge. Unfortunately, it throws
the following error:

Traceback (most recent call last):
119
  File
"D:/mifs-master_2/MU/learning-from-imbalanced-classes-master/learning-from-imbalanced-classes-master/continuous/Final
Logit/SMOTENC/logit-final - Copy.py", line 419, in <module>
41
    pipeline_with_resampling =
make_pipeline(SMOTENC(categorical_features=cat_indices1), pipeline)
  File
"C:\Users\Markazi.co\Anaconda3\lib\site-packages\sklearn\pipeline.py", line
594, in make_pipeline
    return Pipeline(_name_estimators(steps), memory=memory)
  File
"C:\Users\Markazi.co\Anaconda3\lib\site-packages\sklearn\pipeline.py", line
119, in __init__
    self._validate_steps()
  File
"C:\Users\Markazi.co\Anaconda3\lib\site-packages\sklearn\pipeline.py", line
167, in _validate_steps
    " '%s' (type %s) doesn't" % (t, type(t)))
TypeError: All intermediate steps should be transformers and implement fit
and transform. 'SMOTENC(categorical_features=['x95', 'x97', 'x99', 'x100',
'x121_1', 'x121_2', 'x121_3', 'x121_4', 'x121_5', 'x121_6', 'x121_7',
'x121_8', 'x121_9', 'x121_10', 'x121_11', 'x121_12', 'x121_13', 'x121_14',
'x121_15', 'x121_16', 'x121_17', 'x121_18', 'x121_19', 'x121_20',
'x121_21', 'x121_22', 'x121_23', 'x121_24', 'x121_25', 'x121_26',
'x121_27', 'x121_28', 'x121_29', 'x121_30', 'x121_31', 'x121_32',
'x121_33', 'x121_34', 'x121_35', 'x121_36', 'x121_37'],
    k_neighbors=5, n_jobs=1, random_state=None, sampling_strategy='auto')'
(type <class 'imblearn.over_sampling._smote.SMOTENC'>) doesn't

Thanks in advance.
Best regards,

On Mon, Jan 21, 2019 at 2:26 PM Guillaume Lemaître <g.lemaitre58 at gmail.com>
wrote:

> SMOTENC will internally one hot encode the features, generate new
> features, and finally decode.
> So you need to do something like:
>
>
> from imblearn.pipeline import make_pipeline, Pipeline
>
> num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values)
> cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values)
> print(len(num_indices1))
> print(len(cat_indices1))
>
> pipeline=Pipeline(steps= [
>     # Categorical features
>     ('feature_processing', FeatureUnion(transformer_list = [
>             ('categorical', MultiColumn(cat_indices1)),
>
>             #numeric
>             ('numeric', Pipeline(steps = [
>                 ('select', MultiColumn(num_indices1)),
>                 ('scale', StandardScaler())
>                         ]))
>         ])),
>     ('clf', rg)
>     ]
> )
>
> pipeline_with_resampling = make_pipeline(SMOTENC(categorical_features=cat_indices_1), pipeline)
>
>
>
>
> On Sun, 20 Jan 2019 at 18:05, S Hamidizade <hamidizade.s at gmail.com> wrote:
>
>> Dear Scikit-learners
>> Hi.
>>
>> I would greatly appreciate if you could let me know how to use
>> SMOTENC.  I wrote:
>>
>> num_indices1 = list(X.iloc[:,np.r_[0:94,95,97,100:123]].columns.values)
>> cat_indices1 = list(X.iloc[:,np.r_[94,96,98,99,123:160]].columns.values)
>> print(len(num_indices1))
>> print(len(cat_indices1))
>>
>> pipeline=Pipeline(steps= [
>>     # Categorical features
>>     ('feature_processing', FeatureUnion(transformer_list = [
>>             ('categorical', MultiColumn(cat_indices1)),
>>
>>             #numeric
>>             ('numeric', Pipeline(steps = [
>>                 ('select', MultiColumn(num_indices1)),
>>                 ('scale', StandardScaler())
>>                         ]))
>>         ])),
>>     ('clf', rg)
>>     ]
>> )
>>
>> Therefore, as it is indicated I have 5 categorical features. Really,
>> indices 123 to 160 are related to one categorical feature with 37 possible
>> values which is converted into 37 columns using get_dummies.
>>  Sorry, I think SMOTENC should be inserted before the classifier ('clf',
>> reg) but I don't know how to define "categorical_features" in SMOTENC.
>> Besides, could you please let me know where to use imblearn.pipeline?
>>
>> Thanks in advance.
>> Best regards,
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>
>
> --
> Guillaume Lemaitre
> INRIA Saclay - Parietal team
> Center for Data Science Paris-Saclay
> https://glemaitre.github.io/
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20190124/b281b077/attachment-0001.html>


More information about the scikit-learn mailing list