really slow gzip decompress, why?

Jeff McNeil jeff at jmcneil.net
Mon Jan 26 10:51:42 EST 2009


On Jan 26, 10:22 am, redbaron <ivanov.ma... at gmail.com> wrote:
> I've one big (6.9 Gb) .gz file with text inside it.
> zcat bigfile.gz > /dev/null does the job in 4 minutes 50 seconds
>
> python code have been doing the same job for 25 minutes and still
> doesn't finish =( the code is simpliest I could ever imagine:
>
> def main():
>   fh = gzip.open(sys.argv[1])
>   all(fh)
>
> As far as I understand most of the time it executes C code, so pythons
> no overhead should be noticible. Why is it so slow?

Look what's happening in both operations. The zcat operation is simply
uncompressing your data and dumping directly to /dev/null. Nothing is
done with the data as it's uncompressed.

On the other hand, when you call 'all(fh)', you're iterating through
every element in in bigfile.gz.  In other words, you're reading the
file and scanning it for newlines versus simply running the
decompression operation.





More information about the Python-list mailing list