Binary vs. Text mode

Dave Brueck dbrueck at edgix.com
Thu Nov 30 09:04:55 EST 2000


Hi Ben,

On Windows the newline character ('\n') is silently replaced on-the-fly with
'\r\n' when you read and write in text mode. This goofiness invalidates the
numbers used by seek/tell. In binary mode (aka "not broken mode") no
translation occurs... dumb, huh?

-Dave

> -----Original Message-----
> From: python-list-admin at python.org
> [mailto:python-list-admin at python.org]On Behalf Of Ben Mitchell
> Sent: Wednesday, November 29, 2000 5:15 PM
> To: python-list at python.org
> Subject: Binary vs. Text mode
>
>
> Hello,
>
> I'm a little confused as to why, exactly, I'm seeing the
> following behavior
> and I'm hoping someone can clarify for me.
>
> First, assume I have a large xml-like file ("myfile") which has a tag
> </document> that appears alone on a number of lines
> throughout the file.
>
> I ran the following against that file:
>
> -----
> import os
> import sys
> import string
>
> fp = open("myfile", "r+")
> while 1:
>     justbefore = fp.tell()
>     line = fp.readline()
>     if not line :
>         break
>     line = string.strip(line)
>     if line == "</document>":
>         fp.seek(justbefore, 0)
>         reread = fp.readline()
>         print string.strip(reread)
> ----
>
> Now it was my expectation that this would print out a whole
> bunch of lines
> that looked like:
> </document>
> </document>
> </document>
> </document>
> </document>
> </document>
> </document>
> ...
>
> Instead, I got:
> 5D
> 15E
> ocument>
> >
> nt>
> xt>
> ocument>
> cument>
>
> ocument>
> ument>
> ument>
> </url>
> <url>
> l>
> ument>
> ment>
> ocument>
> 16F
> ment>
> ument>
> nt>
>
> document>
> t>
> cument>
> /url>
> nt>
> ment>
> url>
> ument>
> cument>
> l>
> nt>
> cument>
> url>
> url>
> t>
>
> ent>
> cument>
> cument>
> cument>
> cument>
> cument>
> cument>
> ...
>
> When I then changed to opening in "rb+" mode instead of "r+",
> everything
> worked fine.
>
> This yields two questions.  The first is why the byte offsets
> returned by
> fp.tell() vary depending on the read mode in which I've
> opened the file?
> Isn't it just returning a number of bytes from the head of
> the file?  That
> doesn't vary, regardless of how the system perceives the data.
>
> The second question is the more perplexing one for me.  It
> looks like, on a
> large enough number of instances that it's not random, the
> tell operation on
> a text mode opened file returned a location that was very
> close to where I
> expected it to be. (That's all those "ument>" and similar
> strings.)  If in
> fact text mode is going to screw up the tell operation, shouldn't it
> *really* screw it up instead of getting it consistently close?
>
> In sum, I'd like to better understand the implications of
> opening in binary
> versus text mode.
>
> This is on a Windows box, btw.
>
> Thanks in advance for any clarification you can provide!
>
> Best,
>
> -Ben
>
>
> --
> http://www.python.org/mailman/listinfo/python-list





More information about the Python-list mailing list