[issue19395] unpickled LZMACompressor is crashy

Fri Oct 25 22:49:19 CEST 2013

Nadeem Vawda added the comment:

As far as I can tell, liblzma provides no way to serialize a compressor's
state, so the best we can do is raise a TypeError when attempting to
pickle the LZMACompressor (and likewise for LZMADecompressor).

Also, it's worth pointing out that the provided code wouldn't work even
if you could serialize LZMACompressor objects - each call to compress()
updates the compressor's internal state with information needed by the
final call to flush(), but each compress() call would be made on a
*copy* of the compressor rather than the original object. So flush()
would end up producing bogus data (and mostly likely all compress()
calls after the first would too).

If you are trying to do this because LZMA compression is too slow, I'd
suggest you try using zlib or bz2 instead - both of these algorithms
can compress faster than LZMA (at the expense of your compression ratio).
zlib is faster on both compression and decompression, while bz2 is slower
than lzma at decompression.

Alternatively, you can do parallel compression by calling lzma.compress()
on each block (instead of creating an LZMACompressor), and then joining
the results. But note that (a) this will give you a worse compression
ratio than serial compression (because it can't exploit redundancy shared
between blocks), and (b) using multiprocessing has a performance overhead
of its own, because you will need to copy the input when sending it to
the worker subprocess, and then copy the result when sending it back to
the main process.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue19395>
_______________________________________