[Python-ideas] Adding ziplib

Andrew Barnert abarnert at yahoo.com
Tue Feb 17 23:22:54 CET 2015


On Feb 17, 2015, at 13:17, "Gregory P. Smith" <greg at krypto.org> wrote:

> If bikeshedding about the name is all we're doing at this point I wouldn't worry about the ancient BBS era .arc file format that nobody will write a Python library to read. :)  I'd propose "archivers" as a module name, though arclib is also nice.  If you want to really play homage to a good little used archive format, call it arjlib. ;)
> 
> "The" problem with zipfile, and tarfile and similar today is that they are all under maintained and have a variety of things inherent to the underlying archive formats that are not common to all of them.  ie: zip files have an end of archive central directory index.  tar files do not.  rar files can be seen as a seemingly (sadly?) common successor to zip files.  No doubt there are others (7z? cpio?).  zip files compress individual files, tar files don't support compression as it is applied to the whole archive after the fact.  The amount and type of directory information available within each of these varies. And that doesn't even touch multi file multi part archive support that some support for very horrible hacky reasons.
> 
> coming up with common API for these with the key features needed by all is interesting, doubly so for someone pedantic enough to get an implementation of each correct, but it should be done as a third party library and should ideally not use the stdlib zip or tar support at all. Do such libraries exist for other languages? Any C++ that could be reused? Is there such a thing as high quality code for this kind of task?

I already mentioned BSD libarchive. (It's in C, not C++, but why would you want C++ for this?) It does pretty much everything you're asking for, and more (more formats, like ISO disk images; format auto-detection by name or magic; etc.).

And python-libarchive seems like a pretty up-to-date and well-maintained wrapper. Plus, as I mentioned, it has compatibility wrappers to offer a subset of its functionality with the zipfile and tarfile APIs, making it easy to modify legacy code--e.g., the original problem that started this thread, modifying an app to use zipfile instead of tarfile, would probably only require a couple lines of code (import the tarfile compat library, change the filenames).

But the question isn't whether someone can build an ultimate archive library; something that's self-contained, and easy to build and distribute, but only handles the most important types and the basic level of functionality the current stdlib provides (but in a more consistent way) would still be very useful.


More information about the Python-ideas mailing list