[Tutor] Zip-ing files/folders - collecting opinions

Branimir Petrovic BranimirP@cpas.com
Tue Jan 28 23:17:01 2003


Hi All,

I am trying to put together somethin' littl' in Python that
would do archiving of potentially large files and/or large
folder structures on Windows platform. 

WinZip and its command line add-on (wzzip.exe) works fine
up to 4 GB size limit, therefore can't do what I'd like it 
to do...  My archiving script must be able to compress large
10 GB+ Oracle dump files (among other things).

Other option I considered, and is still in 'game', is 
gzip.exe - GNU's port for Win32. It works like a charm on 
large files and it can be 'driven' via popen. But... But
gzip compresses files one by one in place, leaving behind
N *.gz files if folder had N files for compressing. Moreover 
gzip leaves directories intact - it does not 'know' how to
put everything (folders AND files) in one big archive file 
(like wzzip can). Oh, by the way - I've discovered (preferred 
- hard way) that GNU's tar port for Win32 breaks down upon
reaching 2 GB boundary!? Therefore tar-ing everything 
together prior to gzip-ing on Windows platform does not 
work for my purpose either.

Puthon's zipfile standard library seems to be fine - 
performance is close to gzip.exe's plus it can do additional
trick - it can add multiple files in one archive which is
a mixed blessing since gunzip.exe does not know how to deal
with these (should someone unsuspectingly try to unzip it
using gunzip.exe). But like gzip - Python's zipfile library 
seems not to be able do deal with folders. It can not 
compress whole folder structure in one large archive.

My questions are:

a) Am I missing something with zipfile library, may be it
   can compress/uncompress folder structure in one large 
   zip file?

b) Should I rather 'close the nose' and do it gzip.exe's
   way - M folders with up to N *.gz files in each after
   popen returns (and rest assured that gunzip can always
   recursively un-zip them)?

c) Add all files found in each folder to archive and end
   up with just one large zip file in each folder (keeping
   the folder structure intact, and deleting source files),
   use WinZip for de-archiving (or Python in rare cases 
   when individual archived file expands over 4 GB limit)?

If I missed something obvious, or if you have better idea
or approach, please let me know.

Branimir