[Tutor] setting EOF symbol
Bob Gailer
ramrom@earthling.net
Sat Mar 15 12:31:02 2003
--=======5E774635=======
Content-Type: text/plain; x-avg-checked=avg-ok-7CB96DA9; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 8bit
At 10:42 PM 3/14/2003 -0800, Pijus Virketis wrote:
> I decided to write a little weekend project: a spider to download all
> the articles and comments from one newspaper website (www.lrytas.lt). I
> successfully opened the url:
>
>import urllib
>lr = urllib.urlopen("http://www.lrytas.lt/20030314")
>
>But when it came time to read the html in, there was a problem:
>
>while lr:
lr is an instance (in my case, <addinfourl at 18152752 whose fp =
<socket._fileobject instance at 0x0114E5E8>>) and therefore will always
test True.
The same is true of any file-like object. The way to loop until no more
lines is:
while 1:
line = lr.readline()
if line:
print(line)
else:
break
Since Python does not support embedded assignment, there's no way to read
and test at the same time. SIGH.
>Obviously, I will be doing more than just echoing the source, but this is
>sufficient to show the issue. The EOF does not seem to be hit, and I have
>an infinite loop. The </html> tag just rolls by, and Python eventually hangs.
Bob Gailer
mailto:ramrom@earthling.net
303 442 2625
--=======5E774635=======
Content-Type: text/plain; charset=us-ascii; x-avg=cert; x-avg-checked=avg-ok-7CB96DA9
Content-Disposition: inline
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.459 / Virus Database: 258 - Release Date: 2/25/2003
--=======5E774635=======--