[issue18003] New lzma crazy slow with line-oriented reading.

Sun May 19 19:52:20 CEST 2013

Nadeem Vawda added the comment:

I agree that making lzma.open() wrap its return value in a BufferedReader
(or BufferedWriter, as appropriate) is the way to go. I'm currently
travelling and don't have my SSH key with me - Serhiy, can you make the
change?

I'll put together a documentation patch that recommends using lzma.open()
rather than LZMAFile directly, and mentions the performance implications.

> Interestingly, opening in text (i.e. unicode) mode is almost as fast as with a BufferedReader:

This is because opening in text mode returns a TextIOWrapper, which is
written in C, and presumably does its own buffering on top of
LZMAFile.read1() instead of calling LZMAFile.readline().

> From my perspective default wrapping with io.BufferedReader is a great
> idea. I can't think of who would suffer. Maybe someone who wants to
> open thousands of simultaneous streams wouldn't appreciate the memory
> overhead. If that person exists then he would want an option to turn
> it off.

If someone doesn't want the BufferedReader/BufferedWriter, they can
create an LZMAFile directly; we don't plan to remove that possibility. So
I don't think that should be a problem.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue18003>
_______________________________________