read text file byte by byte

Mon Dec 14 15:06:15 EST 2009

On Dec 14, 1:57 pm, Dennis Lee Bieber <wlfr... at ix.netcom.com> wrote:
> On Sun, 13 Dec 2009 22:56:55 -0800 (PST), "sjdevn... at yahoo.com"
> <sjdevn... at yahoo.com> declaimed the following in
> gmane.comp.python.general:
>
>
>
>
>
>
>
> > The 3.1 documentation specifies that file.read returns bytes:
>
> > file.read([size])
> >     Read at most size bytes from the file (less if the read hits EOF
> > before obtaining size bytes). If the size argument is negative or
> > omitted, read all data until EOF is reached. The bytes are returned as
> > a string object. An empty string is returned when EOF is encountered
> > immediately. (For certain files, like ttys, it makes sense to continue
> > reading after an EOF is hit.) Note that this method may call the
> > underlying C function fread() more than once in an effort to acquire
> > as close to size bytes as possible. Also note that when in non-
> > blocking mode, less data than was requested may be returned, even if
> > no size parameter was given.
>
> > Does it need fixing?
>
>         I'm still running 2.5 (Maybe next spring I'll see if all the third
> party libraries I have exist in 2.6 versions)... BUT...
>
>         "... are returned as a string object..." Aren't "strings" in 3.x now
> unicode? Which would imply, to me, that the interpretation of the
> contents will not be plain bytes.

I'm not even concerned (yet) about how the data is interpreted after
it's read.  First I'm trying to clarify what exactly gets read.

The post I was replying to said "In Python 3.x, f.read(1) will read
one character, which may be more than one byte depending on the
encoding."

That seems at odds with the documentation saying "Read at most size
bytes from the file"--the fact that it's documented to read "size"
bytes rather than "size" (possibly multibyte) characters is emphasized
by the later language saying that the underlying C fread() call may be
called enough times to read as close to size bytes as possible.

If the poster I was replying to is correct, it seems like a
documentation update is in order.  As a long-time programmer, I would
be very surprised to make a call to f.read(X) and have it return more
than X bytes if I hadn't read this here.