Getting elements and text with lxml

J. Pablo Fernández pupeno at pupeno.com
Sat May 17 04:37:49 EDT 2008


On May 17, 2:19 am, "Gabriel Genellina" <gagsl-... at yahoo.com.ar>
wrote:
> En Fri, 16 May 2008 18:53:03 -0300, J. Pablo Fernández <pup... at pupeno.com>  
> escribió:
>
>
>
> > Hello,
>
> > I have an XML file that starts with:
>
> > <vortaro>
> > <art mrk="$Id: a.xml,v 1.10 2007/09/11 16:30:20 revo Exp $">
> > <kap>
> >   <ofc>*</ofc>-<rad>a</rad>
> > </kap>
>
> > out of it, I'd like to extract something like (I'm just showing one
> > structure, any structure as long as all data is there is fine):
>
> > [("ofc", "*"), "-", ("rad", "a")]
>
> > How can I do it? I managed to get the content of boths tags and the
> > text up to the first tag ("\n   "), but not the - (and in other XML
> > files, there's more text outside the elements).
>
> Look for the "tail" attribute.

That gives me the last part, but not the one in the middle:

In : etree.tounicode(e)
Out: u'<kap>\n  <ofc>*</ofc>-<rad>a</rad>\n</kap>\n'

In : e.text
Out: '\n  '

In : e.tail
Out: '\n'

Thanks.



More information about the Python-list mailing list