Hi Debu,

it seems that you run out of memory.
Try using fewer processes.
I don't think that n_jobs = 1000 will perform as you wish.

Setting n_jobs to -1 uses the number of cores in your system.


On 09.12.2016 08:16, Debabrata Ghosh wrote:
Hi All,
                      Greetings !

I am getting JoblibMemoryError while executing a scikit-learn RandomForestClassifier code. Here is my algorithm in short:

from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import train_test_split
import pandas as pd
import numpy as np
clf = RandomForestClassifier(n_estimators=5000, n_jobs=1000)

The dataframe p_input_features contain 134 columns (features) and 5 million rows (observations). The exact error message is given below:

Executing Random Forest Classifier
Traceback (most recent call last):
  File "/home/user/rf_fold.py", line 43, in <module>
  File "/var/opt/ lib/python2.7/site-packages/sklearn/ensemble/forest.py", line 290, in fit
    for i, t in enumerate(trees))
  File "/var/opt/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 810, in __call__
  File "/var/opt/lib /python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 757, in retrieve
    raise exception
sklearn.externals.joblib.my_exceptions.JoblibMemoryError: JoblibMemoryError
Multiprocessing exception:
/var/opt/lib/python2.7/site-packages/sklearn/ensemble/forest.py in fit(self=RandomForestClassifier(bootstrap=True, class_wei...te=None, verbose=0,
            warm_start=False), X=array([[ 0.        ,  0.        ,  0.        , ....        0.        ,  0.        ]], dtype=float32), y=array([[ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.],
       [ 0.]]), sample_weight=None)
    285             trees = Parallel(n_jobs=self.n_jobs, verbose=self.verbose,
    286                              backend="threading")(
    287                 delayed(_parallel_build_trees)(
    288                     t, self, X, y, sample_weight, i, len(trees),
    289                     verbose=self.verbose, class_weight=self.class_weight)
--> 290                 for i, t in enumerate(trees))
        i = 4999
    292             # Collect newly grown trees
    293             self.estimators_.extend(trees)

Please can you help me to identify a possible resolution to this.



