urllib2 - iteration over non-sequence
Gary Herron
gherron at islandtraining.com
Sun Jun 10 00:06:43 EDT 2007
Paul Rubin wrote:
> Erik Max Francis <max at alcyone.com> writes:
>
>> This is really wasteful, as there's no point in reading in the whole
>> file before iterating over it. To get the same effect as file
>> iteration in later versions, use the .xreadlines method::
>>
>> for line in aFile.xreadlines():
>> ...
>>
>
> Ehhh, a heck of a lot of web pages don't have any newlines, so you end
> up getting the whole file anyway, with that method. Something like
>
> for line in iter(lambda: aFile.read(4096), ''): ...
>
> may be best.
>
Certainly there's are cases where xreadlines or read(bytecount) are
reasonable, but only if the total pages size is *very* large. But for
most web pages, you guys are just nit-picking (or showing off) to
suggest that the full read implemented by readlines is wasteful.
Moreover, the original problem was with sockets -- which don't have
xreadlines. That seems to be a method on regular file objects.
For simplicity, I'd still suggest my original use of readlines. If
and when you find you are downloading web pages with sizes that are
putting a serious strain on your memory footprint, then one of the other
suggestions might be indicated.
Gary Herron
More information about the Python-list
mailing list