[scikit-learn] 2 million samples dataset caused python and OS crash

Liu James icefrog1950 at gmail.com
Wed Jan 6 06:00:46 EST 2021


Hi all,

I'm using a medium dataset KDD99  IDS(
https://www.ll.mit.edu/r-d/datasets/1999-darpa-intrusion-detection-evaluation-dataset)
for model training, and the dataset has 2 million  samples.  When using
fit_transform(), the OS crashed with log "Process 13851(python) of user xxx
dumped core. Stack trace
.../numpy/core/_multiarray_umath_cpython_36m_x86_64... ".

The hardware: Centos 8, Intel i9, 128GB RAM, stack size is set unlimited.
Such crash can be reproduced.

Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20210106/4072f441/attachment.html>


More information about the scikit-learn mailing list