How to get xml.etree.ElementTree not bomb on invalid characters in XML file ?

Stefan Behnel stefan_ml at behnel.de
Tue May 4 11:37:50 EDT 2010


Barak, Ron, 04.05.2010 16:11:
>>>   I'm parsing XML files using ElementTree from xml.etree (see code
>>> below (and attached xml_parse_example.py)).
>>>
>>> However, I'm coming across input XML files (attached an example:
>>> tmp.xml) which include invalid characters, that produce the
>>> following traceback:
>>>
>>> $ python xml_parse_example.py
>>> Traceback (most recent call last):
>>> xml.parsers.expat.ExpatError: not well-formed (invalid
>>> token): line 6, column 34
>>
>> I hope you are aware that this means that the input you are
>> parsing is not XML. It's best to reject the file and tell the
>> producers that they are writing broken output files. You
>> should always fix the source, instead of trying to make sense
>> out of broken input in fragile ways.
>>
> The XML file seems to be valid XML (all XML viewers I tried were able to read it).

This is what xmllint gives me:

-----------------------
$ xmllint /home/sbehnel/tmp.xml
tmp.xml:6: parser error : Char 0x0 out of allowed range
   <m_sanApiName1>"MainStorage_snap
                                   ^
tmp.xml:6: parser error : Premature end of data in tag m_sanApiName1 line 6
   <m_sanApiName1>"MainStorage_snap
                                   ^
tmp.xml:6: parser error : Premature end of data in tag DbHbaGroup line 5
   <m_sanApiName1>"MainStorage_snap
                                   ^
tmp.xml:6: parser error : Premature end of data in tag database line 4
   <m_sanApiName1>"MainStorage_snap
                                   ^
-----------------------

The file contains 0-bytes - clearly not XML.

Stefan




More information about the Python-list mailing list