urllib2 - iteration over non-sequence

Paul Rubin http
Sun Jun 10 01:31:29 EDT 2007


Gary Herron <gherron at islandtraining.com> writes:
>  For simplicity, I'd still suggest my original use of readlines.   If
> and when you find you are downloading web pages with sizes that are
> putting a serious strain on your memory footprint, then one of the other
> suggestions might be indicated.

If you know in advance that the page you're retrieving will be
reasonable in size, then using readlines is fine.  If you don't know
in advance what you're retrieving (e.g. you're working on a crawler)
you have to assume that you'll hit some very large pages with
difficult construction.





More information about the Python-list mailing list