Remove whitespaces and line breaks in a XML file

Stefan Behnel stefan_ml at behnel.de
Mon Feb 7 15:54:30 EST 2011


David Vicente, 07.02.2011 18:45:
> I´m parsing an xml file with xml.etree. It works correctly, but I have a
> problem with the text attribute of the elements which should be empty. For
> example, in this case:
>
> <book>
>
>                  <author>Ken<author>
>
> </book>
>
>
>
> The text element of “book” should be empty, but it returns me some
> whitespaces and break lines. I can´t get remove these whitespaces without
> remove information.

Only a DTD (or schema) can provide the information which whitespace in an 
XML document is meaningful and which isn't, so there is no generic way to 
"do it right", especially not for something as generic as an XML parser.

What may work for you is to check if an Element has children and only 
whitespace as text ("not el.text or not el.text.strip()"), and only then 
replace it by None.

Stefan




More information about the Python-list mailing list