[issue30693] tarfile add uses random order

Serhiy Storchaka report at bugs.python.org
Sun Jun 18 13:21:59 EDT 2017


Serhiy Storchaka added the comment:

The patch for similar issue with the glob module was rejected recently since it is easy to sort the result of glob.glob() (see issue30461). This issue looks similar, but there are differences. On one side, the command line tar utility doesn't have the option for sorting file names and seems don't sort them by default (I didn't checked). It is possible to use external sorting with the tarfile module as with the tar utility (generate the list of all files and directories, sort it, and pass every item to TarFile.add with the option recursive=False). But on other side, this is not so easy as for glob.glob(). And the overhead of the sorting is expected to be smaller than for glob.glob(). This may be considered as additional arguments for approving the patch.

If this approach will be approved, it should be applied also to the ZIP archives.

FYI the order of archived files can affect the compression ratio of the compressed tar archive. For example the 7-Zip archiver sorts files by extensions, this increases the chance that files of the same type (text, multimedia, spreadsheet, executables, etc) are grouped together and use the common dictionary for global compression. This isn't directly related to this issue, just a material for possible future enhancement.

----------
nosy: +lars.gustaebel, rhettinger, serhiy.storchaka
stage:  -> patch review
versions:  -Python 3.3, Python 3.4

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue30693>
_______________________________________


More information about the Python-bugs-list mailing list