Improving the web page download code.

Alister alister.ware at ntlworld.com
Wed Aug 28 04:58:21 EDT 2013


On Tue, 27 Aug 2013 12:41:10 -0700, mukesh tiwari wrote:

> Hello All,
> I am doing web stuff first time in python so I am looking for
> suggestions. I wrote this code to download the title of webpages using
> as much less resource ( server time, data download)  as possible and
> should be fast enough. Initially I used BeautifulSoup for parsing but
> the person who is going to use this code asked me not to use this and
> use regular expressions ( The reason was BeautifulSoup is not fast
> enough ? ).

By the time you have written enough RE to reliably parse HTML(I ma not 
sure that that is even strictly possible) you will have re-inverted 
BeautifullSoup, Badly. unless you are looking for a very explicit section 
of data in the page this is not a good idea.



More information about the Python-list mailing list