Zipfile content reading via an iterator?

Tim Chase python.list at tim.thechases.com
Tue Dec 11 15:14:14 EST 2007


I'm dealing with several large items that have been zipped up to 
get quite impressive compression.  However, uncompressed, they're 
large enough to thrash my memory to swap and in general do bad 
performance-related things.  I'm trying to figure out how to 
produce a file-like iterator out of the contents of such an item.

 >>> z = zipfile.zipFile("test.zip")
 >>> info = z.getinfo("data.txt")
 >>> info.compress_size
132987864
 >>> info.file_size
1344250972
 >>> len(z.namelist())
20

I need to be able to access multiple files within it, but I can 
iterate over each one, only seeing small slices of the file. 
Using the read() method triggers the volumnous read.  Thus what I 
have to do currently:

 >>> content = z.read("data.txt") # ouch!
 >>> len(content)
1344250972
 >>> for row in content.splitlines(): process(row) # pain!

What I'm trying to figure out how to do is something like the 
mythical:

 >>> for row in z.file_iter("data.txt"): process(row) # aah

to more efficiently handle the huge stream of data.

Am I missing something obvious?  It seems like iterating over zip 
contents would be a common thing to do (especially when compared 
to reading the whole contents...I mean, they're zipped because 
they're big! :)

Thanks for any pointers,

-tkc






More information about the Python-list mailing list