[issue35195] Pandas read_csv() is 3.5X Slower on Python 3.7.1 vs Python 3.6.7 & 3.5.2 On Windows 10

Dragoljub report at bugs.python.org
Mon Nov 12 20:39:54 EST 2018


Dragoljub <dragoljub at gmail.com> added the comment:

Here is a simple pure python example:

digits = ''.join([str(i) for i in range(10)]*10000000)
%timeit digits.isdigit() # --> 2X+ slower on python 3.7.1

Basically in Pandas C-code parser we call the isdigit() function for each number that is to be parsed. so 12345.6789 calls isdigt() 9 times to determine if this is a digit character that can be converted to a float. The problem is in the latest version of Python with locale updates isdigit() takes a locale argument that seems to be passed over and over slowing down this check. Is it possible that we disable any local passing from Python down to lower-level C code, or simply set the default locale to 'C' to keep it from thrashing?

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35195>
_______________________________________


More information about the Python-bugs-list mailing list