Python 3000: Standard API for archives?

Mon Jun 4 08:02:03 EDT 2007

I'm a relative newbie to Python, so please bear with me.  There are 
currently two standard modules used to access archived data:  zipfile 
and tarfile.  The interfaces are completely different.  In particular, 
script wanting to analyze different types of archives must duplicate 
substantial pieces of logic.  The problem is not limited to method 
names; it includes how stat-like information is accessed.

I think it would be a good thing if a standardized interface existed, 
similar to PEP 247.  This would make it easier for one script to access 
multiple types of archives, such as RAR, 7-Zip, ISO, etc.  In 
particular, a single factory class could produce PEP 302 import hooks 
for future as well as current archive formats.

I think that an archive module adhering to the standard should adopt a 
least-common-denominator approach, initially supporting read-only access 
without seek, i.e. tar files on actual tape.  For applications that 
require a seek method (such as importers) a standard wrapper class could 
transparently cache archive members in temp files; this would fit in 
well with Python 3000's rewrite of the I/O interface to support 
stackable interfaces.  To this end, we'd need is_seekable and 
is_writable attributes for both the module and instances (moduel level 
would declare if something is possible, not if it is always true).

Most importantly, all archive modules should provide a standard API for 
accessing their individual files via a single archive_content class that 
provides a standard 'read' method.  Less importantly but nice to have 
would be a way for archives to be auto-magically scanned during walks of 
  directories.

Feedback?