CDATA and lxml

Silfheed silfheed at gmail.com
Fri Apr 11 18:59:44 EDT 2008


On Apr 11, 3:49 pm, Silfheed <silfh... at gmail.com> wrote:
> On Apr 11, 10:33 am, Stefan Behnel <stefan... at behnel.de> wrote:
>
>
>
> > Hi again,
>
> > Stefan Behnel wrote:
> > > Silfheed wrote:
> > >> So first off I know that CDATA is generally hated and just shouldn't
> > >> be done, but I'm simply required to parse it and spit it back out.
> > >> Parsing is pretty easy with lxml, but it's the spitting back out
> > >> that's giving me issues.  The fact that lxml strips all the CDATA
> > >> stuff off isnt really a big issue either, so long as I can create
> > >> CDATA blocks later with <>&'s showing up instead of <>& .
> > >> I've scoured through the lxml docs, but probably not hard enough, so
> > >> anyone know the page I'm looking for or have a quick how to?
>
> > > There's nothing in the docs because lxml doesn't allow you to create CDATA
> > > sections. You're not the first one asking that, but so far, no one really had
> > > a take on this.
>
> > So I gave it a try, then. In lxml 2.1, you will be able to do this:
>
> >         >>> root = Element("root")
> >         >>> root.text = CDATA('test')
> >         >>> tostring(root))
> >         '<root><![CDATA[test]]></root>'
>
> > This does not work for .tail content, only for .text content (no technical
> > reason, I just don't see why that should be enabled).
>
> > There's also a parser option "strip_cdata" now that allows you to leave CDATA
> > sections in the tree. However, they will *not* behave any different than
> > normal text, so you can't even see at the API level that you are dealing with
> > CDATA. If you want to be really, really sure, you can always do this:
>
> >         >>> root.text = CDATA(root.text)
>
> > Hope that helps,
>
> > Stefan
>
> That is immensely cool.  Do you plan to stick it into svn soon?
> Thanks!

Ah, looks like it's there already.  Very cool, very cool.  Thanks
again.



More information about the Python-list mailing list