[Tutor] setting EOF symbol

Bob Gailer ramrom@earthling.net
Sat Mar 15 12:31:02 2003


--=======5E774635=======
Content-Type: text/plain; x-avg-checked=avg-ok-7CB96DA9; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 8bit

At 10:42 PM 3/14/2003 -0800, Pijus Virketis wrote:
>  I decided to write a little weekend project: a spider to download all 
> the articles and comments from one newspaper website (www.lrytas.lt). I 
> successfully opened the url:
>
>import urllib
>lr = urllib.urlopen("http://www.lrytas.lt/20030314")
>
>But when it came time to read the html in, there was a problem:
>
>while lr:

lr is an instance (in my case, <addinfourl at 18152752 whose fp = 
<socket._fileobject instance at 0x0114E5E8>>) and therefore will always 
test True.

The same is true of any file-like object. The way to loop until no more 
lines is:

while 1:
   line = lr.readline()
   if line:
     print(line)
   else:
     break

Since Python does not support embedded assignment, there's no way to read 
and test at the same time. SIGH.

>Obviously, I will be doing more than just echoing the source, but this is 
>sufficient to show the issue. The EOF does not seem to be hit, and I have 
>an infinite loop. The </html> tag just rolls by, and Python eventually hangs.


Bob Gailer
mailto:ramrom@earthling.net
303 442 2625

--=======5E774635=======
Content-Type: text/plain; charset=us-ascii; x-avg=cert; x-avg-checked=avg-ok-7CB96DA9
Content-Disposition: inline


---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.459 / Virus Database: 258 - Release Date: 2/25/2003

--=======5E774635=======--