Zipfile content reading via an iterator?
Tim Chase
python.list at tim.thechases.com
Tue Dec 11 15:14:14 EST 2007
I'm dealing with several large items that have been zipped up to
get quite impressive compression. However, uncompressed, they're
large enough to thrash my memory to swap and in general do bad
performance-related things. I'm trying to figure out how to
produce a file-like iterator out of the contents of such an item.
>>> z = zipfile.zipFile("test.zip")
>>> info = z.getinfo("data.txt")
>>> info.compress_size
132987864
>>> info.file_size
1344250972
>>> len(z.namelist())
20
I need to be able to access multiple files within it, but I can
iterate over each one, only seeing small slices of the file.
Using the read() method triggers the volumnous read. Thus what I
have to do currently:
>>> content = z.read("data.txt") # ouch!
>>> len(content)
1344250972
>>> for row in content.splitlines(): process(row) # pain!
What I'm trying to figure out how to do is something like the
mythical:
>>> for row in z.file_iter("data.txt"): process(row) # aah
to more efficiently handle the huge stream of data.
Am I missing something obvious? It seems like iterating over zip
contents would be a common thing to do (especially when compared
to reading the whole contents...I mean, they're zipped because
they're big! :)
Thanks for any pointers,
-tkc
More information about the Python-list
mailing list