[Tutor] setting EOF symbol

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Sat Mar 15 12:27:01 2003


> import urllib
> lr = urllib.urlopen("http://www.lrytas.lt/20030314")
>
> But when it came time to read the html in, there was a problem:
>
> while lr:
> 	print(lr.readline())

Hi Pijus,

The bug is that you're assuming that 'lr' will be set to None when we run
out of bytes to suck in.  However, it's not our 'lr' file object that gets
set to None, but the return value of lr.readline().  The loop can be fixed
by checking the return value of lr.readline():

###
while 1:
    line = lr.readline()
    if not line: break
    # rest of block
###


This might look a little weird, to set up what looks like an infinite
loop, but it's a fairly common way to process a file by lines.


Even so, can we make this look nicer?  We can make the loop look nicer if
we use Python 2.2's iterators.  When Python 2.2 came out, regular file
objects could be walked across --- iterated --- directly with a 'for'
loop:

###
>>> for line in f:
...     print len(line)
...
10
18
15
1
37
21
72
45
19
1
1
35
19
###

And this looks nicer than that 'while 1/break' sort of thing that we did
above.  It would be nice to do the same with the file-like object from
urllib.urlopen()!  But does it work?

###
>>> f = urllib.urlopen('http://python.org')
>>> i = iter(f)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: iteration over non-sequence
###


Doh.  Unfortunately, not yet.  However, we can make it work with a little
bit of code:

###
>>> def fileLikeIter(f):
...     """A generator that returns an iterator to a file-like object.
...        We expect to use this on file-like objects, like
...        urllib.urlopen(), when we want to iterate line by line."""
...     while 1:
...         line = f.readline()
...         if not line: raise StopIteration
...         yield line
###


The code above is called a 'generator', and it wraps around an existing
file-like object and makes it look like an iterator.  Let's try it out!


###
>>> for line in fileLikeIter(urllib.urlopen('http://python.org')):
...     print len(line),
...
64 60 7 61 34 27 35 29 31 1 4 1 7 39 73 46 59 35 60 55 65 61 11 50 34 50
65 8 39 39 38 23 29 64 29 5 31 66 1 9 18 27 58 64 29 25 50 35 31 32 26 5
27 12 10 27 31 10 27 35 10 27 35 10 10 27 31 10 27 32 10 27 31 10 27 27 10
6 41 1 28 32 39 32 64 32 63 26 52 15 22 27 47 11 27 35 11 27 225 11 27 38
11 27 32 11 27 32 11 27 51 11 27 61 11 27 87 11 27 25 11 27 25 11 27 62 11
33 52 14 22 27 35 11 27 63 11 27 37 11 27 41 11 27 40 11 27 49 11 27 46 11
27 71 11 27 66 11 33 52 9 22 27 47 11 27 44 11 27 40 11 27 50 11 27 83 18
11 27 36 11 33 52 6 22 27 82 11 27 57 11 27 49 11 27 56 11 27 66 11 27 78
11 33 52 17 22 27 53 11 27 56 11 27 51 11 27 51 11 27 69 11 27 50 11 27 57
11 27 72 11 27 50 11 27 46 11 27 45 11 27 58 11 27 61 11 27 46 11 27 61 11
33 52 12 22 27 53 11 27 58 11 33 52 9 22 27 63 11 27 7 11 27 13 9 27 55 5
11 27 7 11 27 12 11 27 68 11 38 1 6 46 29 28 47 83 1 4 33 51 1 5 1 8 1 5
19 1 65 65 66 69 54 13 52 1 6 6 1 9 1 39 1 14 1 1 77 77 77 1 38 5 1 83 42
61 12 15 19 1 5 1 70 46 21 1 71 10 1 71 21 1 78 80 66 1 68 65 1 76 46 80
47 50 1 6 5 1 11 1 9 1 1 65 71 1 5 68 74 61 70 49 68 26 70 63 21 66 27 6 1
33 1 5 64 34 71 46 74 16 64 46 66 42 66 66 36 68 51 6 1 33 1 5 1 92 1 7 65
56 1 64 46 1 58 14 1 68 1 57 69 6 1 63 70 46 1 93 54 1 6 1 39 70 1 46 60
18 1 10 1 31 38 35 15
###

That's better.  *grin*

Please feel free to ask more questions about this.  I hope this helps!