[XML-SIG] xml / html parsing for web

kentsin kentsin@sinaman.com
Wed Dec 13 21:45:15 HKT 2000


Yes, you are right. There are no general way to do this. I am not making a general spider, my job is to collect some information on the web automatically. I have a small set of targets, so I would like to build a framework of spider which I could customer for every target site. One of the target contains links build with a pull down option list. So I need a way to include that.

I think the regular expression way is simple for newbie like me to handle, the problem is that it seems very difficult to customize like the above cases? The other problem is that I want to base the selection of action on hot words (which is the words between <a> and </a>.) And I want to preserve the order of the links so I could customer the action to choose a specific link by its location. 

I think the regular expression method is very difficult for this, but I have try with the parser way, but they crash with ill structure htmls. 

There are many parser modules comes with python, Can someone comment on them on my case? How to choose between them?


===================================================================
新浪免費電子郵箱 http://sinamail.sina.com.hk 
立即下載 SinaTicker http://sinaticker.sina.com.hk