CDATA and lxml

Silfheed silfheed at gmail.com
Fri Apr 11 18:49:45 EDT 2008


On Apr 11, 10:33 am, Stefan Behnel <stefan... at behnel.de> wrote:
> Hi again,
>
> Stefan Behnel wrote:
> > Silfheed wrote:
> >> So first off I know that CDATA is generally hated and just shouldn't
> >> be done, but I'm simply required to parse it and spit it back out.
> >> Parsing is pretty easy with lxml, but it's the spitting back out
> >> that's giving me issues.  The fact that lxml strips all the CDATA
> >> stuff off isnt really a big issue either, so long as I can create
> >> CDATA blocks later with <>&'s showing up instead of <>& .
> >> I've scoured through the lxml docs, but probably not hard enough, so
> >> anyone know the page I'm looking for or have a quick how to?
>
> > There's nothing in the docs because lxml doesn't allow you to create CDATA
> > sections. You're not the first one asking that, but so far, no one really had
> > a take on this.
>
> So I gave it a try, then. In lxml 2.1, you will be able to do this:
>
>         >>> root = Element("root")
>         >>> root.text = CDATA('test')
>         >>> tostring(root))
>         '<root><![CDATA[test]]></root>'
>
> This does not work for .tail content, only for .text content (no technical
> reason, I just don't see why that should be enabled).
>
> There's also a parser option "strip_cdata" now that allows you to leave CDATA
> sections in the tree. However, they will *not* behave any different than
> normal text, so you can't even see at the API level that you are dealing with
> CDATA. If you want to be really, really sure, you can always do this:
>
>         >>> root.text = CDATA(root.text)
>
> Hope that helps,
>
> Stefan

That is immensely cool.  Do you plan to stick it into svn soon?
Thanks!



More information about the Python-list mailing list