[Tutor] ElementTree, TidyHTMLTreeBuilder, find
Kent Johnson
kent37 at tds.net
Wed Dec 14 13:08:01 CET 2005
Bob Tanner wrote:
> Having problem understanding how find() works.
>
> The html file I'm using is attached.
>
> Python 2.4.2 (No.2, Nov 20 2005, 17:04:48)
>
>>>>from elementtidy import TidyHTMLTreeBuilder
>>>>doc = TidyHTMLTreeBuilder.parse('048229.html')
>>>>root = doc.getroot()
>>>>print root.find('html/body')
>
> None
>
>>>>print root.find('body')
>
> None
>
>
> Viewing the html under firefox DOM tool
>
> -#document
> -HTML
> +HEAD
> #TEXT
> +BODY
>
> No sure how to use the find.
Let's try it at the interpreter prompt to see what is going on:
>>> from elementtidy import TidyHTMLTreeBuilder as Tidy
>>> doc = Tidy.parse(r'D:\WUTemp\temp.html')
>>> doc
<elementtree.ElementTree.ElementTree instance at 0x00A4D4E0>
>>> doc.find('body')
>>> doc.find('BODY')
>>> doc.find('//BODY')
OK, that doesn't work :-) but you knew that!
Let's just look at the root element:
>>> doc.getroot()
<Element {http://www.w3.org/1999/xhtml}html at a55620>
Ah, that explains it! TidyHTMLTreeBuilder puts the elements in a namespace. That means you
have to include the namespace as part of the search string for find:
>>> doc.find('{http://www.w3.org/1999/xhtml}body')
<Element {http://www.w3.org/1999/xhtml}body at a557b0>
That works!
Kent
More information about the Tutor
mailing list