[Python-Dev] Universal newlines, and the gzip module.

Brett Cannon brett at python.org
Thu Jan 29 23:45:55 CET 2009


On Thu, Jan 29, 2009 at 12:39, Christopher Barker <Chris.Barker at noaa.gov> wrote:
> Hi all,
>
> Over on the matplotlib mailing list, we ran into a problem with trying to
> use Universal newlines with gzip. In virtually all of my code that reads
> text files, I use the 'U' flag to open files, it really helps not having to
> deal with newline issues. Yes, they are fewer now that the Macintosh uses
> \n, but they can still be a pain.
>
> Anyway, we added such support to some matplotlib methods, and found that
> gzip file reading broken We were passing the flags though into either file()
> or gzip.open(), and passing 'U' into gzip.open() turns out to be fatal.
>
> 1) It would be nice if the gzip module (and the zip lib module) supported
> Universal newlines -- you could read a compressed text file with "wrong"
> newlines, and have them handled properly. However, that may be hard to do,
> so at least:
>
> 2) Passing a 'U' flag in to gzip.open shouldn't break it.
>
> I took a look at the Python SVN (2.5.4 and 2.6.1) for the gzip lib. I see
> this:
>
>
>        # guarantee the file is opened in binary mode on platforms
>        # that care about that sort of thing
>        if mode and 'b' not in mode:
>            mode += 'b'
>        if fileobj is None:
>            fileobj = self.myfileobj = __builtin__.open(filename, mode or
> 'rb')
>
> this is going to break for 'U' == you'll get 'rUb'. I tested file(filename,
> 'rUb'), and it looks like it does universal newline translation.
>
> So:
>
> * Either gzip should be a bit smarter, and remove the 'U' flag (that's what
> we did in the MPL code), or force 'rb' or 'wb'.
>
> * Or: file opening should be a bit smarter -- what does 'rUb' mean? a file
> can't be both Binary and Universal Text. Should it raise an exception?
> Somehow I think it would be better to ignore the 'U', but maybe that's only
> because of the issue I happen to be looking at now.
>
>
> That later seems a better idea -- this issue could certainly come up in
> other places than the gzip module, but maybe it would break a bunch of code
> -- who knows?

I think it should be raising an exception as 'rUb' is an invalid value
for the argument.

-Brett


More information about the Python-Dev mailing list