Checking for EOF in stream

Nathan nejucomo at gmail.com
Tue Feb 20 03:27:38 EST 2007


On 2/19/07, Gabriel Genellina <gagsl-py at yahoo.com.ar> wrote:
> En Mon, 19 Feb 2007 21:50:11 -0300, GiBo <gibo at gentlemail.com> escribió:
>
> > Grant Edwards wrote:
> >> On 2007-02-19, GiBo <gibo at gentlemail.com> wrote:
> >>>
> >>> Classic situation - I have to process an input stream of unknown length
> >>> until a I reach its end (EOF, End Of File). How do I check for EOF? The
> >>> input stream can be anything from opened file through sys.stdin to a
> >>> network socket. And it's binary and potentially huge (gigabytes), thus
> >>> "for line in stream.readlines()" isn't really a way to go.
> >>>
> >>> For now I have roughly:
> >>>
> >>> stream = sys.stdin
> >>> while True:
> >>>     data = stream.read(1024)
> >>         if len(data) == 0:
> >>              break  #EOF
> >>>     process_data(data)
> >
> > Right, not a big difference though. Isn't there a cleaner / more
> > intuitive way? Like using some wrapper objects around the streams or
> > something?
>
> Read the documentation... For a true file object:
> read([size]) ... An empty string is returned when EOF is encountered
> immediately.
> All the other "file-like" objects (like StringIO, socket.makefile, etc)
> maintain this behavior.
> So this is the way to check for EOF. If you don't like how it was spelled,
> try this:
>
>    if data=="": break
>
> If your data is made of lines of text, you can use the file as its own
> iterator, yielding lines:
>
> for line in stream:
>      process_line(line)
>
> --
> Gabriel Genellina
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>

Not to beat a dead horse, but I often do this:

data = f.read(bufsize):
while data:
    # ... process data.
    data = f.read(bufsize)


-The only annoying bit it the duplicated line.  I find I often follow
this pattern, and I realize python doesn't plan to have any sort of
do-while construct, but even still I prefer this idiom.  What's the
concensus here?

What about creating a standard binary-file iterator:

def blocks_of(infile, bufsize = 1024):
    data = infile.read(bufsize)
    if data:
        yield data


-the use would look like this:

for block in blocks_of(myfile, bufsize = 2**16):
    process_data(block) # len(block) <= bufsize...



More information about the Python-list mailing list