proposal: another file iterator

Sun Jan 15 21:20:59 EST 2006

On 15 Jan 2006 16:44:24 -0800, Paul Rubin <"http://phr.cx"@nospam.invalid> wrote:
>I find pretty often that I want to loop through characters in a file:
>
>  while True:
>     c = f.read(1)
>     if not c: break
>     ...
>
>or sometimes of some other blocksize instead of 1.  It would sure
>be easier to say something like:
>
>   for c in f.iterbytes(): ...
>
>or
>
>   for c in f.iterbytes(blocksize): ...
>
>this isn't anything terribly advanced but just seems like a matter of
>having the built-in types keep up with language features.  The current
>built-in iterator (for line in file: ...) is useful for text files but
>can potentially read strings of unbounded size, so it's inadvisable for
>arbitrary files.
>
>Does anyone else like this idea?

It's a pretty useful thing to do, but the edge-cases are somewhat complex.  When I just want the dumb version, I tend to write this:

    for chunk in iter(lambda: f.read(blocksize), ''):
        ...

Which is only very slightly longer than your version.  I would like it even more if iter() had been written with the impending doom of lambda in mind, so that this would work:

    for chunk in iter('', f.read, blocksize):
        ...

But it's a bit late now.  Anyhow, here are some questions about your iterbytes():

  * Would it guarantee the chunks returned were read using a single read?  If blocksize were a multiple of the filesystem block size, would it guarantee reads on block-boundaries (where possible)?

  * How would it handle EOF?  Would it stop iterating immediately after the first short read or would it wait for an empty return?

  * What would the buffering behavior be?  Could one interleave calls to .next() on whatever iterbytes() returns with calls to .read() on the file?

Jean-Paul