[Tutor] iterating over a urllib.urlopen().read() object

Mon Dec 8 13:36:39 EST 2003

On Mon, 8 Dec 2003, Alan Gauld wrote:

> >  f = urllib.urlopen(source_url)
> >  BUFSIZE = 8192
> >
> >  while True:
> >      data = f.read(BUFSIZE)
> >      if not data: break
> >      p.feed(data)
> >

> > I didn't like the "while True:" construct, and too smart for my own
> > good, tried this instead:
> >
> >  for data in f.read(BUFSIZE):
> >       p.feed(data)

Hi Terry,

The difference between:

    for line in file: ...

and

    for data in f.read(BUFSIZE): ...

is one of values on the right side of the 'for data in...'.

In the first case, we take 'file', and and ask our loop to iterate across
it.  But in the second case, we first take the value of:

    f.read(BUFSIZE)

And we know this is a string of maximum length BUFSIZE.  Once we have that
value, we ask our loop to go across it.  Strings do support iteration:

###
>>> for character in "hello world":
...     print character,
...
h e l l o   w o r l d
###

So the second case does go across some of the data in the file, but only a
character at a time, and only at most BUFSIZE characters of it!

So the problem is one of getting the right iterator to go across the whole
file.  It is possible to make this work with an auxillary tool.  Here's a
small function that may help:

###
def readIter(f, blocksize=8192):
    """Given a file 'f', returns an iterator that returns bytes of
    size 'blocksize' from the file, using read()."""
    while True:
        data = f.read(blocksize)
        if not data: break
        yield data
###

The above code defines an iterator for files, using Python's "generator"
support.  Parts of it should look vaguely familiar.  *grin*

Ok, so this just shifts the 'while True' stuff around into readIter().
But this relocation does help, because it lets us say this now:

###
for block in readIter(f, BUFSIZE):
    p.feed(block)
##

Hope this helps!