file read, binary or text mode

Michael Hoffman m.h.3.9.1.without.dots.at.cam.ac.uk at example.com
Sat Sep 25 21:52:15 EDT 2004


Alan G Isaac wrote:

> "Roel Schroeven" <rschroev_nospam_ml at fastmail.fm> wrote in message
> news:OjW4d.255917$OR1.13371520 at phobos.telenet-ops.be...
> 
>>It's safe in the sense that everything goes out exactly as it came in.
>>For example, gzip uses binary mode even when compressing text files. The
>>files may be text, but gzip doesn't care about that. It doesn't care
>>about words, sentences and line endings, but it does care about
>>representing exactly the bytes that are in the file.
> 
> I think the following is the same question from another angle.

I think you should consider the same answer from this angle. ;)

> I have an .zip archive of compressed files that
> I want to decompress.  Using the zipfile module,
> I tried
> z=zipfile.ZipFile(local.zip)
> for zname in z.namelist():
>         localtxtfile='c:/puthere/'+zname
>         f=open(localtxtfile,'w')
>         f.write(z.read(zname))
>         f.close
> 
> The original files were all plain text,
> created on an unspecified platform.

Are you sure the platform is unspecified? You can find out the platform 
by doing zipfile.getinfo(zname).create_system and then *yuck* looking up 
the ID number you get against the list in 
<http://www.pkware.com/company/standards/appnote/>.

> The files I decompressed this way contained
> *two successive* carriage returns
> (ASCII 13) at the end of each line.
> If I change 'w' to 'wb' I get only one
> carriage return at the end of each line.
> 
> Why is this extra carriage return added?

I imagine the file in the archive was created on a DOS-type system, 
where the line ending is \r\n. That's what you read in. When you write 
it out in "w" mode the \n is expanded to \r\n without checking to see if 
there is already a \r beforehand. So you get \r\r\n.

Essentially you should consider the archive file to be read in "rb" 
mode. Writing in "w" mode instead of "wb" mode will give you extra 
carriage returns.

If you want to be able to get "universal newline" input from your 
zipfile, consider piping input through this generator and using "w" mode:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/286165

Then you should get the correct line ending for a text file without 
regard to the current platform or the one where the archive was created.
-- 
Michael Hoffman



More information about the Python-list mailing list