[issue18744] pathological performance using tarfile

Serhiy Storchaka report at bugs.python.org
Fri Aug 16 22:59:35 CEST 2013


Serhiy Storchaka added the comment:

Thank you for the script Richard.

If you say about performance degradation when extracting a tarfile in changed order this behavior is expected. When you read a gzip file in random order you need seek in it. A gzip file is a singe-direction road. For seeking in a gzip file you need decompress all data between you current position (or from the file start) and target position. In case of random order you need decompress 1/3 tarfile in the mean for every extracted file.

THe tarfile module can't do anything with this. It can't first extract all file in the memory because uncompressed file can be too big. It can't resort a list of extracted file in natural order because it can change semantic (a tarfile can contains duplicates and symlinks). Just don't do this. Don't extract a large number of files from compressed tarfile in changed order.

----------
nosy: +nadeem.vawda, r.david.murray
status: open -> pending

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue18744>
_______________________________________


More information about the Python-bugs-list mailing list