Parsing HTML

Fri Oct 1 19:50:47 EDT 2004

Anders Eriksson <anders.eriksson at morateknikutveckling.se> wrote in message news:<jmj5q0gv1g1k$.dlg at morateknikutveckling.se>...
> Hello!
> 
> I want to extract some info from a some specific HTML pages, Microsofts
> International Word list (e.g.
> http://msdn.microsoft.com/library/en-us/dnwue/html/swe_word_list.htm). I
> want to take all the words, both English and the other language and create
> a dictionary. so that I can look up About and get Om as the answer.
> 
> How is the best way to do this?

http://www.xml.com/pub/a/2004/09/08/pyxml.html

-- 
Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
 
A hands-on introduction to ISO Schematron -
http://www-106.ibm.com/developerworks/edu/x-dw-xschematron-i.html
XML circles the globe - http://www.javareport.com/article.asp?id=9797
Principles of XML design: Considering container elements -
http://www-106.ibm.com/developerworks/xml/library/x-contain.html
Hacking XML Hacks - http://www-106.ibm.com/developerworks/xml/library/x-think26.html
A survey of XML standards -
http://www-106.ibm.com/developerworks/xml/library/x-stand4/