[Python-Dev] Single- vs. Multi-pass iterability

Alex Martelli aleax@aleax.it
Thu, 18 Jul 2002 23:23:50 +0200


On Thursday 18 July 2002 09:18 pm, Guido van Rossum wrote:
> > > I've just had a thought. Maybe it would be less of a mess
> > > if what we are calling "iterators" had been called "streams"
> >
> > Possibly -- I did use the "streams" name often in the tutorial
> > on iterators and generators, it's a very natural term.
>
> OTOH in C++ and Java, "stream" refers to an open file object (to
> emphasize the iteratorish feeling of a file opened for sequential
> reading or writing, as opposed to the concept of a file as a
> random-access array of bytes on disk).

...and in Unix Sys/V, if I recall correctly, it refered to an allegedly
superior way to do things BSD did with sockets (and more).  Any
nice-looking term will be complicatedly overloaded by now.  I
think "seborrea" is still free, though (according to some old Dilbert
strips, at least).


> > Seekable files can be multi-pass, but in the strict sense
> > that you can rewind them -- it's still impractical to have
> > them produce multiple *independent* iterators (needing
> > some sort of in-memory caching).
>
> It would be trivial if you had an object representing the notion of a
> file on disk rather than an open file.  Each iterator would be
> implemented as a separate open file referring to the same filename.

For a *read-only* disk file, yes -- at least on Unix-ish systems, you
could also get the same effect with dup2 without even needing any
filename around (e.g. on an already-unlinked file).   Hmmm, I do
think win32 has something like dup2 -- my copy of Richter remained
with think3 (it was actually theirs:-), and I do little Windows these days
so I haven't bought another, but I'm pretty sure half an hour on
MSDN would let me find it.

Maybe something can be built around this -- the underlying disk file
as the container, dup2 or equivalent to make independent iterators/
streams (as long as nobody's writing the file... but that's not too
different from iterating on e.g. a list, where an insert or del would
mess things up...).  But surely not by sticking with stdio.

Which leads us back to my "this is rather academic" statement:
don't we need to stick with stdio to support existing extensions
which use FILE*'s, anyway?


Alex