[issue18003] New lzma crazy slow with line-oriented reading.

Michael Fox report at bugs.python.org
Sat May 18 23:48:22 CEST 2013


Michael Fox added the comment:

I looked into it a little and it looks like pyliblzma is a pure C
extension whereas new lzma library wraps liblzma but the rest is
python. In particular this happens for every line:

        if size < 0:
            end = self._buffer.find(b"\n", self._buffer_offset) + 1
            if end > 0:
                line = self._buffer[self._buffer_offset : end]
                self._buffer_offset = end
                self._pos += len(line)
                return line

And while that doesn't look like a lot of overhead, it's definitely
something. So, unless someone thinks that a pure C extension is the
right technical direction, lzma in 3.4 is probably as fast as it's
ever going to be. I will just use the workaround of piping in unxz
regardless.

On Sat, May 18, 2013 at 2:12 PM, Michael Fox <415fox at gmail.com> wrote:
> 3.4 is much better but still 4x slower than 2.7
>
> m at air:~/q/topaz/parse_datalog$ time python2.7 lzmaperf.py
> 102368
>
> real    0m0.053s
> user    0m0.052s
> sys     0m0.000s
> m at air:~/q/topaz/parse_datalog$ time
> ~/tmp/cpython-23836f17e4a2/bin/python3.4 lzmaperf.py
> 102368
>
> real    0m0.229s
> user    0m0.212s
> sys     0m0.012s
>
> The bottleneck has moved here:
>  102369    0.151    0.000    0.226    0.000 lzma.py:333(readline)
>
> I don't know if this is a strictly fair comparison. The lzma module
> and pyliblzma may not be of the same quality. I've just come across a
> real bug in pyliblzma. It doesn't apply to this test, but who knows
> what shortcuts it's taking.
>
> Finally, here's a baseline:
>
> m at air:~/q/topaz/parse_datalog$ time xzcat bigfile.xz | wc -l
> 102368
>
> real    0m0.034s
> user    0m0.024s
> sys     0m0.016s
>
> On Sat, May 18, 2013 at 12:46 PM, Nadeem Vawda <report at bugs.python.org> wrote:
>>
>> Nadeem Vawda added the comment:
>>
>> Have you tried running the benchmark against the default (3.4) branch?
>> There was some significant optimization work done in issue 16034, but
>> the changes were not backported to 3.3.
>>
>> ----------
>>
>> _______________________________________
>> Python tracker <report at bugs.python.org>
>> <http://bugs.python.org/issue18003>
>> _______________________________________
>
>
>
> --
>
> -
> Michael

-- 

-
Michael

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue18003>
_______________________________________


More information about the Python-bugs-list mailing list