urllib2 - iteration over non-sequence

Sun Jun 10 01:54:47 EDT 2007

Gary Herron wrote:

> Certainly there's are cases where xreadlines or read(bytecount) are
> reasonable, but only if the total pages size is *very* large.  But for
> most web pages, you guys are just nit-picking (or showing off) to
> suggest that the full read implemented by readlines is wasteful. 
> Moreover, the original problem was with sockets -- which don't have
> xreadlines.  That seems to be a method on regular file objects.
> 
>  For simplicity, I'd still suggest my original use of readlines.   If
> and when you find you are downloading web pages with sizes that are
> putting a serious strain on your memory footprint, then one of the other
> suggestions might be indicated.

It isn't nitpicking to point out that you're making something that will 
consume vastly more amounts of memory than it could possibly need.  And 
insisting that pages aren't _always_ huge is just a silly cop-out; of 
course pages get very large.

There is absolutely no reason to read the entire file into memory (which 
is what you're doing) before processing it.  This is a good example of 
the principle of there is one obvious right way to do it -- and it isn't 
to read the whole thing in first for no reason whatsoever other than to 
avoid an `x`.

-- 
Erik Max Francis && max at alcyone.com && http://www.alcyone.com/max/
  San Jose, CA, USA && 37 20 N 121 53 W && AIM, Y!M erikmaxfrancis
   The more violent the love, the more violent the anger.
    -- _Burmese Proverbs_ (tr. Hla Pe)