[scikit-learn] XGboost Classifier error

Thu Apr 20 00:21:37 EDT 2017

Hi Olivier,

Thanks for your info.I will follow it from now on.  Details of traceback
are given below:

----------Full traceback---------------

Fitting 3 folds for each of 10 candidates, totalling 30 fits

C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\grid_search.py:43:
DeprecationWarning: This module was deprecated in version 0.18 in
favor of the model_selection module into which all the refactored
classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)

---------------------------------------------------------------------------OverflowError
                            Traceback (most recent call
last)<ipython-input-19-321b410b10ad> in <module>()     18      19 --->
20 random_search_sg.fit(scaled_data, labels)     21      22
print("RandomizedSearchCV took %.2f seconds for %d candidates"
C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\grid_search.py
in fit(self, X, y)   1023
self.n_iter,   1024
random_state=self.random_state)-> 1025         return self._fit(X, y,
sampled_params)
C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\grid_search.py
in _fit(self, X, y, parameter_iterable)    571
            self.fit_params, return_parameters=True,    572
                         error_score=self.error_score)--> 573
       for parameters in parameter_iterable    574                 for
train, test in cv)    575
C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py
in __call__(self, iterable)    756             # was dispatched. In
particular this covers the edge    757             # case of Parallel
used with an exhausted iterator.--> 758             while
self.dispatch_one_batch(iterator):    759
self._iterating = True    760             else:
C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py
in dispatch_one_batch(self, iterator)    601     602         with
self._lock:--> 603             tasks =
BatchedCalls(itertools.islice(iterator, batch_size))    604
 if len(tasks) == 0:    605                 # No more tasks available
in the iterator: tell caller to stop.
C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py
in __init__(self, iterator_slice)    125     126     def
__init__(self, iterator_slice):--> 127         self.items =
list(iterator_slice)    128         self._size = len(self.items)
129
C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\grid_search.py
in <genexpr>(.0)    567             pre_dispatch=pre_dispatch    568
      )(--> 569
delayed(_fit_and_score)(clone(base_estimator), X, y, self.scorer_,
570                                     train, test, self.verbose,
parameters,    571
self.fit_params, return_parameters=True,
C:\Users\ssampathkumar\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\grid_search.py
in __iter__(self)    250                     + " For exhaustive
searches, use GridSearchCV.")    251             for i in
sample_without_replacement(grid_size, self.n_iter,--> 252
                                   random_state=rnd):    253
      yield param_grid[i]    254
sklearn\utils\_random.pyx in
sklearn.utils._random.sample_without_replacement
(sklearn\utils\_random.c:3975)()
OverflowError: Python int too large to convert to C long

-------------------End of traceback-----------------------------

Shape of scaled_data and labels are: (772330, 15) and (772330,) (I tried
using scaled_data as CSR matrix as well as numpy array)

btw, when I run it separately (without *randomizedsearchCV*), it works fine
with the same dataset:

---- ---------------------------Code below runs
fine-------------------------------------

params_c = { 'n_estimators': 310, 'learning_rate': 0.1, 'min_child_weight':
5, 'max_depth': 10, 'gamma': 0, 'max_delta_step': 14, 'max_depth':5,
'subsample': 1, 'colsample_bytree': 1, 'colsample_bylevel': 1,
'reg_lambda': 1, 'reg_alpha': 0, 'scale_pos_weight': 1, 'objective':
'binary:logistic', 'silent': False, } c = xgb.XGBClassifier(**params_c)
X_train, X_test, y_train, y_test = train_test_split(scaled_data, labels)

from sklearn.metrics import confusion_matrix c.fit(X_train,y_train) y_pred
= c.predict(X_test) cm3 = confusion_matrix(y_test, y_pred) print(cm3)

---------End of code that runs fine --------------------

On Wed, Apr 19, 2017 at 4:45 PM, Olivier Grisel <olivier.grisel at ensta.org>
wrote:

> Please provide the full traceback. Without it it's impossible to tell
> whether the problem is in scikit-learn or xgboost.
>
> Also, please provide a minimal reproduction script as explained in:
>
> http://scikit-learn.org/stable/faq.html#what-s-the-
> best-way-to-get-help-on-scikit-learn-usage
>
> --
> Olivier
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20170420/d2608649/attachment-0001.html>