file read, binary or text mode
Michael Hoffman
m.h.3.9.1.without.dots.at.cam.ac.uk at example.com
Sat Sep 25 21:52:15 EDT 2004
Alan G Isaac wrote:
> "Roel Schroeven" <rschroev_nospam_ml at fastmail.fm> wrote in message
> news:OjW4d.255917$OR1.13371520 at phobos.telenet-ops.be...
>
>>It's safe in the sense that everything goes out exactly as it came in.
>>For example, gzip uses binary mode even when compressing text files. The
>>files may be text, but gzip doesn't care about that. It doesn't care
>>about words, sentences and line endings, but it does care about
>>representing exactly the bytes that are in the file.
>
> I think the following is the same question from another angle.
I think you should consider the same answer from this angle. ;)
> I have an .zip archive of compressed files that
> I want to decompress. Using the zipfile module,
> I tried
> z=zipfile.ZipFile(local.zip)
> for zname in z.namelist():
> localtxtfile='c:/puthere/'+zname
> f=open(localtxtfile,'w')
> f.write(z.read(zname))
> f.close
>
> The original files were all plain text,
> created on an unspecified platform.
Are you sure the platform is unspecified? You can find out the platform
by doing zipfile.getinfo(zname).create_system and then *yuck* looking up
the ID number you get against the list in
<http://www.pkware.com/company/standards/appnote/>.
> The files I decompressed this way contained
> *two successive* carriage returns
> (ASCII 13) at the end of each line.
> If I change 'w' to 'wb' I get only one
> carriage return at the end of each line.
>
> Why is this extra carriage return added?
I imagine the file in the archive was created on a DOS-type system,
where the line ending is \r\n. That's what you read in. When you write
it out in "w" mode the \n is expanded to \r\n without checking to see if
there is already a \r beforehand. So you get \r\r\n.
Essentially you should consider the archive file to be read in "rb"
mode. Writing in "w" mode instead of "wb" mode will give you extra
carriage returns.
If you want to be able to get "universal newline" input from your
zipfile, consider piping input through this generator and using "w" mode:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/286165
Then you should get the correct line ending for a text file without
regard to the current platform or the one where the archive was created.
--
Michael Hoffman
More information about the Python-list
mailing list