Functionality similar to PHP's SimpleXML?

Phillip B Oldham phillip.oldham at gmail.com
Fri Jun 13 04:43:36 EDT 2008


I'm sure I'll soon figure out how to find these things out for myself,
but I'd like to get the community's advice on something.

I'm going to throw together a quick project over the weekend: a
spider. I want to scan a website for certain elements.

I come from a PHP background, so normally I'd:
 - throw together a quick REST script to handle http request/responses
 - load the pages into a simplexml object and
 - run an xpath over the dom to find the nodes I need to test

One of the benefits of PHP's dom implementation is that you can easily
load both XML and HTML4 documents - the HTML gets normalised to XML
during the import.

So, my questions are:

Is there a python module to easily handle http request/responses?

Is there a python dom module that works similar to php's when working
with older html?

What python module would I use to apply an XPath expression over a dom
and return the results?




More information about the Python-list mailing list