[scikit-learn] sklearn.model_selection.GridSearchCV - unable to use n_jobs>1 on MacOS Sierra python 2.7

Sumeet Sandhu sumeet.k.sandhu at gmail.com
Tue Jan 9 00:22:16 EST 2018

There are two cases : n_jobs > 1 works when data is smaller - when the
training docs numpy array is 15MB. It does not work when training matrix is
100MB. My Mac has 16GB RAM.

In the second case, the jobs die out pretty quickly, in seconds, and the
main python process seems to die out (min CPU usage). There is a popup
message saying 'python processes appear to have died'. This is when i run
python on bash command line.
When I run in python GUI IDLE, a message pops up 'your program is still
running, sure you want to close window'.

What are these jobs anyway? Are they various parameter combinations in
param_grid, or lower level jobs out of compiler etc?
Does each job replicate the training data in RAM?


On Sun, Jan 7, 2018 at 11:35 AM, Sumeet Sandhu <sumeet.k.sandhu at gmail.com>

> Hi,
> I was able to run this with n_jobs=-1, and the activity monitor does show
> all 8 CPUs engaged, but the jobs start to die out one by one. I tried with
> n_jobs=2, same story.
> The only option that works is n_jobs=1.
> I played around with 'pre_dispatch' a bit - unclear what that does.
> GRID = GridSearchCV(LogisticRegression(), param_grid, scoring=None,
> fit_params=None, n_jobs=1, iid=True, refit=True, cv=10, verbose=0,
> error_score=0, return_train_score=False)
> GRID.fit(trainDocumentV,trainLabelV)
> How can I sustain at least 3-4 parallel jobs?
> thanks,
> Sumeet
