[Tutor] python module to search a website
vineeth
vineethrakesh at gmail.com
Sun Feb 27 08:46:32 CET 2011
Hi Bill,
Thanks for the reply, I know how the urllib module works I am not
looking for scraping. I am looking to obtain the html page that my query
is going to return. Just like when you type in a site like Amazon you
get a bunch of product listing the module has to search the website and
return the html link. I can ofcourse scrap the information from that link.
Thanks
Vin
On 02/27/2011 12:04 AM, Bill Allen wrote:
> n Sat, Feb 26, 2011 at 21:11, vineeth <vineethrakesh at gmail.com
> <mailto:vineethrakesh at gmail.com>> wrote:
>
> Hello all,
>
> I am looking forward for a python module to search a website and
> extract the url.
>
> For example I found a module for Amazon with the name
> "amazonproduct", the api does the job of extracting the data based
> on the query it even parses the url data. I am looking some more
> similar query search python module for other websites like Amazon.
>
> Any help is appreciated.
>
> Thank You
> Vin
>
> I am not sure what url you are trying to extract, or from where, but I
> can give you an example of basic web scraping if that is your aim.
>
> The following works for Python 2.x.
>
> #This one module that gives you the needed methods to read the html
> from a webpage
> import urllib
>
> #set a variable to the needed website
> mypath = "http://some_website.com"
>
> #read all the html data from the page into a variable and then parse
> through it looking for urls
> mylines = urllib.urlopen(mypath).readlines()
> for item in mylines:
> if "http://" in item:
> ...do something with the url that was found in the page html...
> ...etc...
>
>
> --Bill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110227/fff5e33f/attachment-0001.html>
More information about the Tutor
mailing list