tarfile woes

Hans-Joachim Widmaier hjwidmaier at web.de
Thu Aug 21 13:29:50 EDT 2003


Although I've done a bit of ranting before on this, noone seems to have
noticed. I'll try again, hopefully more to the point.

One of the additions in the standard library I liked most is the tarfile
module. This module came in very handy for one of my programs.
Alas, I had to discover that:

 - bzip2 compressed files cannot be read from a "fake" (StringIO) file
   object, only from real files. This is (imho) unbelievably ugly, as
   I have the file already in a string. I really do not want to read it
   a second time. Or a third time, when the user finally decides that
   she wants the archive actually unpacked (second was TOC listing).

 - It does not handle compressed (.Z) archives. Of course there's
   noone to blame. The gzip utility (which is used by gnu tar) handles
   this ancient algorithm, but apparently, zlib does not. :-(

 - TarInfo and ZipInfo (zipfile) objects differ without need.

       TarInfo attribute     ZipFile attribute

       name                  filename
       size                  file_size
       mtime                 date_time	(int, 6-Tuple)

I can see 2 reasons for that: 1. The library is written by a bunch of
different guys at different dates. Everyone's got her own style, and it
shows. 2. The underlying internals ahll get exposed to some degree.
I'm not sure this is good. Yes, I didn't care much about this only months
ago. But when I tried to write that class that read just about anything,
I suddenly found myself writing the same code with a few attribute names
and format strings changed. (btw, I even resent names like 'file_size'.
Why then not 'file_name'?)

Somewhere in the not-so-near future lies the ominous Python 3.0, said to
be incompatible to the current language (to some degree). Does that hold
for the library, too? If yes, wouldn't that be a good time to unify
classes like those *Info? With this big changes the timestamps could also
be made DateTime objects ...

Ok, enough ranting for a day's worth. The remaining big question, of
course, is: "Who's going to do all that?" I'd offer some help if I felt
up to the task (somehow I have great difficulties understanding newer
modules with all those clever tricks. Guess I'm not clever enough. :-().

hjw




More information about the Python-list mailing list