[issue18003] New lzma crazy slow with line-oriented reading.
Antoine Pitrou
report at bugs.python.org
Sun May 19 16:07:00 CEST 2013
Antoine Pitrou added the comment:
I second Serhiy here. Wrapping the LZMAFile in a BufferedReader is the simple solution to the performance problem:
./python -m timeit -s "import lzma, io" "f=lzma.LZMAFile('words.xz', 'r')" "for line in f: pass"
10 loops, best of 3: 148 msec per loop
$ ./python -m timeit -s "import lzma, io" "f=io.BufferedReader(lzma.LZMAFile('words.xz', 'r'))" "for line in f: pass"
10 loops, best of 3: 44.3 msec per loop
$ time xzcat words.xz | wc -l
99156
real 0m0.021s
user 0m0.016s
sys 0m0.004s
Perhaps the top-level lzma.open() should do the wrapping for you, though.
Interestingly, opening in text (i.e. unicode) mode is almost as fast as with a BufferedReader:
$ ./python -m timeit -s "import lzma, io" "f=lzma.open('words.xz', 'rt')" "for line in f: pass"
10 loops, best of 3: 51.1 msec per loop
----------
nosy: +pitrou
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue18003>
_______________________________________
More information about the Python-bugs-list
mailing list