[Baypiggies] replacement for urllib2 that can handle xhtml

Keith Dart keith at dartworks.biz
Tue Dec 28 07:58:00 CET 2010


=== On Mon, 12/27, Tony Cappellini wrote: ===
> What's the best module/package for parsing xhtml?
> HTMLParser is built in, but is there another package which is more
> like urlib2 or Beautiful Soup- but handles xhtml?

===

I use the one I wrote. ;-) I like it. Here is an example of using it:

Python> from pycopia.WWW import XHTML
Python> doc=XHTML.get_document("http://www.kdart.com/resume_for_Keith_Dart.xhtml")
Python> print doc.get_path("/html/body/h1")
<h1>Resume for Keith Dart</h1>

That is my resume in XHTML format, previously validated. You can also
verify it is standards compliant XHTML by trying to open it in MS
Internet Explorer and watch it fail. ;-)

It parses into what I call the Pythonic Object Model, or POM. 

The code is viewable here:

http://code.google.com/p/pycopia/source/browse/trunk/WWW/pycopia/WWW/XHTML.py


-- Keith Dart

-- 

-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   Keith Dart <keith at dartworks.biz>
   public key: ID: 19017044
   <http://www.dartworks.biz/>
   =====================================================================


More information about the Baypiggies mailing list