file read, binary or text mode

Sun Sep 26 02:16:17 EDT 2004

"Alan G Isaac" <aisaac at american.edu> wrote:
>
>I think the following is the same question from another angle.
>I have an .zip archive of compressed files that
>I want to decompress.  Using the zipfile module,
>I tried
>z=zipfile.ZipFile(local.zip)
>for zname in z.namelist():
>        localtxtfile='c:/puthere/'+zname
>        f=open(localtxtfile,'w')
>        f.write(z.read(zname))
>        f.close
>
>The original files were all plain text,
>created on an unspecified platform.

Not true.  They were in plain text, created on a DOS/Windows platform.

>The files I decompressed this way contained
>*two successive* carriage returns
>(ASCII 13) at the end of each line.
>If I change 'w' to 'wb' I get only one
>carriage return at the end of each line.
>
>Why is this extra carriage return added?

Because the original file inside the zip file contained \r\n.  z.read
returns you those exact bytes.  When you write "\r\n" to a text file in
Windows, the \r is written as \r, and the \n is written as \r\n.  This, you
end up with \r\r\n.

>My original guess was the using 'w' instead
>of 'wb' would be the right action, since the
>platform for the original files is unspecified
>and the original files are known to be plain text.

No.  If you do not know what your buffer contains, you should always use
'wb' so that those contents are not altered.

That's the real lesson: when you write using 'w' or 'wt', the buffer is
changed on the way out.  You only want that if you know exactly what you
are writing.
-- 
- Tim Roberts, timr at probo.com
  Providenza & Boekelheide, Inc.