popen2() gives broken pipe for large files - is it fixed yet?

Donn Cave donn at drizzle.com
Wed Sep 25 00:31:40 EDT 2002


Quoth Frank Gibbons <fgibbons at hms.harvard.edu>:
...
| It worked fine for a small test file, but I get a broken pipe for realistic 
| sizes (1MB). I've written a little test script that shows the same behavior 
| (see below). I've searched the newsgroups, and find that popen2() has a 
| history of breaking the pipe on large files. Does anyone know if this has 
| been fixed? Is popen3 any better? I've also read that the os module now has 
| a popen function (as of Python 2.x, forget which x). Is this any better?
...
| data = f.readlines()
...
| zip_output, zip_input = popen2.popen2("%s -c  "% GZIP)
| data = string.join(data, "")
...
| zip_input.write(data)
...
| zipfile = open("popen.gz", "w+")
| zipfile.write(zip_output.read())

It isn't going to be fixed, either, because there are just some
axiomatic truths in operation here, like "I/O buffers aren't
infinitely elastic." Since your example doesn't do anything with
the output, you don't have to shoot yourself in the foot this way.
Try something like this -

  zip_input = os.popen('gzip -c > popen.gz', 'w')
  while 1:
      data = f.read(16000)
      if not data:
          break
      zip_input.write(data)
  zip_input.close()

The main point is that this allows gzip to write directly to disk.
If your actual application needs to process the data further at this
point, you may read it back from disk and write to a new file if you
like.   Otherwise, if you really want a second pipe, you have to figure
out how to read some, write some, etc., to keep gzip's output from
filling up while you're trying to write all its input.

	Donn Cave, donn at drizzle.com



More information about the Python-list mailing list