Checking for EOF in stream

Tue Feb 20 03:33:55 EST 2007

On 2/20/07, Nathan <nejucomo at gmail.com> wrote:
> On 2/19/07, Gabriel Genellina <gagsl-py at yahoo.com.ar> wrote:
> > En Mon, 19 Feb 2007 21:50:11 -0300, GiBo <gibo at gentlemail.com> escribió:
> >
> > > Grant Edwards wrote:
> > >> On 2007-02-19, GiBo <gibo at gentlemail.com> wrote:
> > >>>
> > >>> Classic situation - I have to process an input stream of unknown length
> > >>> until a I reach its end (EOF, End Of File). How do I check for EOF? The
> > >>> input stream can be anything from opened file through sys.stdin to a
> > >>> network socket. And it's binary and potentially huge (gigabytes), thus
> > >>> "for line in stream.readlines()" isn't really a way to go.
> > >>>
> > >>> For now I have roughly:
> > >>>
> > >>> stream = sys.stdin
> > >>> while True:
> > >>>     data = stream.read(1024)
> > >>         if len(data) == 0:
> > >>              break  #EOF
> > >>>     process_data(data)
> > >
> > > Right, not a big difference though. Isn't there a cleaner / more
> > > intuitive way? Like using some wrapper objects around the streams or
> > > something?
> >
> > Read the documentation... For a true file object:
> > read([size]) ... An empty string is returned when EOF is encountered
> > immediately.
> > All the other "file-like" objects (like StringIO, socket.makefile, etc)
> > maintain this behavior.
> > So this is the way to check for EOF. If you don't like how it was spelled,
> > try this:
> >
> >    if data=="": break
> >
> > If your data is made of lines of text, you can use the file as its own
> > iterator, yielding lines:
> >
> > for line in stream:
> >      process_line(line)
> >
> > --
> > Gabriel Genellina
> >
> > --
> > http://mail.python.org/mailman/listinfo/python-list
> >
>
> Not to beat a dead horse, but I often do this:
>
> data = f.read(bufsize):
> while data:
>     # ... process data.
>     data = f.read(bufsize)
>
>
> -The only annoying bit it the duplicated line.  I find I often follow
> this pattern, and I realize python doesn't plan to have any sort of
> do-while construct, but even still I prefer this idiom.  What's the
> concensus here?
>
> What about creating a standard binary-file iterator:
>
> def blocks_of(infile, bufsize = 1024):
>     data = infile.read(bufsize)
>     if data:
>         yield data
>
>
> -the use would look like this:
>
> for block in blocks_of(myfile, bufsize = 2**16):
>     process_data(block) # len(block) <= bufsize...
>


(ahem), make that iterator something that works, like:

def blocks_of(infile, bufsize = 1024):
    data = infile.read(bufsize)
    while data:
        yield data
        data = infile.read(bufsize)