[scikit-learn] MLPClassifier/Regressor and Kernel Processes when Multiprocessing

Tue Apr 28 15:06:00 EDT 2020

Hi SciKit-Learn folks,

I am building a stacked generalization classifier using the multilayer
perceptron classifier as one of it's submodels. All data have been
preprocessed appropriately and I am tuning each submodel's hyperparameters
with a customized randomized search protocol (very similar to sklearn's
RandomizedSearchCV). Importantly, I am using Python's
Multiprocessing.Pool() to parallelize this search.

When I start the hyperparameter search, jobs/threads do indeed spawn
appropriately. Tuning other submodels (RandomForestClassifier, SVC,
GradientBoostingClassifier, SDGClassifier) works perfectly, which each job
(model with particular randomized parameters) being scored with
cross_val_score and returning when the Pool of workers is complete. All is
well until I reach the MLPClassifier model. Jobs spawn as with the other
models, however, System CPU (Linux Kernel) processes surge and overwhelm my
server. Approximately 20% of the CPUs are running User processes, while the
other 80% of CPUS are running System/Kernel processes, causing immense
slow-down. Again, this only happens with the MLPClassifier - all other
models run appropriately with ~98% User processes and ~2% System/Kernel
processes.

Is there something unique in the MLPClassifier/Regressor models that causes
increased System/Kernel processes compared to other models? In an attempt
to troubleshoot, I used sklearn's RandomizedSearchCV instead of my custom
implementation and the same problems happen (with n_jobs specified in the
same way).

Any help with why the MLP models are behaving this way during
multiprocessing is much appreciated.
Best,
Taylor Keding
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200428/cc6b7fde/attachment.html>