Python vs. Java gzip performance

Bill foobarbazqux at ftml.net
Fri Mar 17 10:54:49 EST 2006


I've written a small program that, in part, reads in a file and parses
it.  Sometimes, the file is gzipped.  The code that I use to get the
file object is like so:

if filename.endswith(".gz"):
    file = GzipFile(filename)
else:
    file = open(filename)

Then I parse the contents of the file in the usual way (for line in
file:...)

The equivalent Java code goes like this:

if (isZipped(aFile)) {
    input = new BufferedReader(new InputStreamReader(new
GZIPInputStream(new FileInputStream(aFile)));
} else {
    input = new BufferedReader(new FileReader(aFile));
}

Then I parse the contents similarly to the Python version (while
nextLine = input.readLine...)

The Java version of this code is roughly 2x-3x faster than the Python
version.  I can get around this problem by replacing the Python
GzipFile object with a os.popen call to gzcat, but then I sacrifice
portability.  Is there something that can be improved in the Python
version?

Thanks -- Bill.




More information about the Python-list mailing list