Generic web parser

S.Selvam s.selvamsiva at gmail.com
Mon May 18 09:43:31 EDT 2009


On Mon, May 18, 2009 at 1:59 PM, Jeremiah Dodds <jeremiah.dodds at gmail.com>wrote:

>
>
> On Sat, May 16, 2009 at 2:18 PM, S.Selvam <s.selvamsiva at gmail.com> wrote:
>
>> Hi all,
>>
>> I have to design web parser which will visit the given list of websites
>> and need to fetch a particular set of details.
>> It has to be so generic that even if we add new websites, it must fetch
>> those details if available anywhere.
>> So it must be something like a framework.
>>
>> Though i have done some parsers ,but they will parse for a given
>> format(For. eg It will get the data from <title> tag).But here each website
>> may have different format and the information may available within any tags.
>>
>> I know its a tough task for me,but i feel with python it should be
>> possible.
>> My request is, if such thing is already available please let me know ,also
>> your suggestions are welcome.
>>
>> Note: I planned to use BeautifulSoup for parsing.
>>
>> --
>> Yours,
>> S.Selvam
>>
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>>
>>
> I'd recommend mechanize in combination with BeautifulSoup - it greatly
> simplifies most web-scraping tasks.
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>
>

Thank you all for your response,

I have started to develop my design based on BeautifulSoup,i planned to
write separate module for each information which i would like to extract
from the website and through the url at it.It has to extract the required
information if available.

Each module tries with pattern matching and returns the result.

I planned to write it in a generic way.I welcome your suggestions.
-- 
Yours,
S.Selvam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20090518/21e1a959/attachment-0001.html>


More information about the Python-list mailing list