[Tutor] python module to search a website

vineeth vineethrakesh at gmail.com
Sun Feb 27 08:46:32 CET 2011


Hi Bill,

Thanks for the reply, I know how the urllib module works I am not 
looking for scraping. I am looking to obtain the html page that my query 
is going to return. Just like when you type in a site like Amazon you 
get a bunch of product listing the module has to search the website and 
return the html link. I can ofcourse scrap the information from that link.

Thanks
Vin

On 02/27/2011 12:04 AM, Bill Allen wrote:
> n Sat, Feb 26, 2011 at 21:11, vineeth <vineethrakesh at gmail.com 
> <mailto:vineethrakesh at gmail.com>> wrote:
>
>     Hello all,
>
>     I am looking forward for a python module to search a website and
>     extract the url.
>
>     For example I found a module for Amazon with the name
>     "amazonproduct", the api does the job of extracting the data based
>     on the query it even parses the url data. I am looking some more
>     similar query search python module for other websites like Amazon.
>
>     Any help is appreciated.
>
>     Thank You
>     Vin
>
> I am not sure what url you are trying to extract, or from where, but I 
> can give you an example of basic web scraping if that is your aim.
>
> The following works for Python 2.x.
>
> #This one module that gives you the needed methods to read the html 
> from a webpage
> import urllib
>
> #set a variable to the needed website
> mypath = "http://some_website.com"
>
> #read all the html data from the page into a variable and then parse 
> through it looking for urls
> mylines = urllib.urlopen(mypath).readlines()
> for item in mylines:
>     if "http://" in item:
>          ...do something with the url that was found in the page html...
>          ...etc...
>
>
> --Bill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110227/fff5e33f/attachment-0001.html>


More information about the Tutor mailing list