Generic web parser

Jeremiah Dodds jeremiah.dodds at gmail.com
Mon May 18 04:29:54 EDT 2009


On Sat, May 16, 2009 at 2:18 PM, S.Selvam <s.selvamsiva at gmail.com> wrote:

> Hi all,
>
> I have to design web parser which will visit the given list of websites and
> need to fetch a particular set of details.
> It has to be so generic that even if we add new websites, it must fetch
> those details if available anywhere.
> So it must be something like a framework.
>
> Though i have done some parsers ,but they will parse for a given
> format(For. eg It will get the data from <title> tag).But here each website
> may have different format and the information may available within any tags.
>
> I know its a tough task for me,but i feel with python it should be
> possible.
> My request is, if such thing is already available please let me know ,also
> your suggestions are welcome.
>
> Note: I planned to use BeautifulSoup for parsing.
>
> --
> Yours,
> S.Selvam
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>
I'd recommend mechanize in combination with BeautifulSoup - it greatly
simplifies most web-scraping tasks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090518/9f165950/attachment-0001.html>


More information about the Python-list mailing list