urllib2 - iteration over non-sequence

Sun Jun 10 00:06:43 EDT 2007

Paul Rubin wrote:
> Erik Max Francis <max at alcyone.com> writes:
>   
>> This is really wasteful, as there's no point in reading in the whole
>> file before iterating over it.  To get the same effect as file
>> iteration in later versions, use the .xreadlines method::
>>
>> 	for line in aFile.xreadlines():
>> 	    ...
>>     
>
> Ehhh, a heck of a lot of web pages don't have any newlines, so you end
> up getting the whole file anyway, with that method.  Something like
>
>    for line in iter(lambda: aFile.read(4096), ''): ...
>
> may be best.
>   
Certainly there's are cases where xreadlines or read(bytecount) are
reasonable, but only if the total pages size is *very* large.  But for
most web pages, you guys are just nit-picking (or showing off) to
suggest that the full read implemented by readlines is wasteful. 
Moreover, the original problem was with sockets -- which don't have
xreadlines.  That seems to be a method on regular file objects.

 For simplicity, I'd still suggest my original use of readlines.   If
and when you find you are downloading web pages with sizes that are
putting a serious strain on your memory footprint, then one of the other
suggestions might be indicated.

Gary Herron