[issue18003] New lzma crazy slow with line-oriented reading.

Michael Fox report at bugs.python.org
Sat May 18 00:27:24 CEST 2013


New submission from Michael Fox:

import lzma
count = 0
f = lzma.LZMAFile('bigfile.xz' ,'r')
for line in f:
    count += 1
print(count)

Comparing python2 with pyliblzma to python3.3.1 with the new lzma:

m at air:~/q/topaz/parse_datalog$ time python  lzmaperf.py
102368

real    0m0.062s
user    0m0.056s
sys     0m0.004s
m at air:~/q/topaz/parse_datalog$ time python3  lzmaperf.py
102368

real    0m7.506s
user    0m7.484s
sys     0m0.012s

Profiling shows most of the time is spent here:

   102371    6.881    0.000    6.972    0.000 lzma.py:247(_read_block)

I also notice that reading the entire file into memory with f.read() is perfectly fast.

I think it has something to do with lack of buffering.

----------
components: Library (Lib)
messages: 189488
nosy: Michael.Fox, nadeem.vawda
priority: normal
severity: normal
status: open
title: New lzma crazy slow with line-oriented reading.
type: performance
versions: Python 3.3

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue18003>
_______________________________________


More information about the Python-bugs-list mailing list