how can i use lxml with win32com?

elca highcar at gmail.com
Mon Oct 26 02:57:45 EDT 2009




motoom wrote:
> 
> elca wrote:
> 
>> http://news.search.naver.com/search.naver?sm=tab_hty&where=news&query=korea+times&x=0&y=0
>> that is korea portal site and i was search keyword using 'korea times'
>> and i want to scrap resulted to text name with 'blogscrap_save.txt'
> 
> Aha, now we're getting somewhere.
> 
> Getting and parsing that page is no problem, and doesn't need JavaScript 
> or Internet Explorer.
> 
> import urllib2
> import BeautifulSoup
> doc=urllib2.urlopen("http://news.search.naver.com/search.naver?sm=tab_hty&where=news&query=korea+times&x=0&y=0")
> soup=BeautifulSoup.BeautifulSoup(doc)
> 
> 
> By analyzing the structure of that page you can see that the articles 
> are presented in an unordered list which has class "type01".  The 
> interesting bit in each list item is encapsulated in a <dd> tag with 
> class "sh_news_passage".  So, to parse the articles:
> 
> ul=soup.find("ul","type01")
> for li in ul.findAll("li"):
>      dd=li.find("dd","sh_news_passage")
>      print dd.renderContents()
>      print
> 
> This example prints them, but you could also save them to a file (or a 
> database, whatever).
> 
> Greetings,
> 
> 
> 
> -- 
> "The ability of the OSS process to collect and harness
> the collective IQ of thousands of individuals across
> the Internet is simply amazing." - Vinod Valloppillil
> http://www.catb.org/~esr/halloween/halloween4.html
> -- 
> http://mail.python.org/mailman/listinfo/python-list
> 
> 


Hi, thanks for your help..
thread is too long, so i will open another new post.
thanks a lot

Paul
-- 
View this message in context: http://www.nabble.com/how-can-i-use-lxml-with-win32com--tp26044339p26055191.html
Sent from the Python - python-list mailing list archive at Nabble.com.




More information about the Python-list mailing list