How to get xml.etree.ElementTree not bomb on invalid characters in XML file ?

John Machin sjmachin at lexicon.net
Tue May 4 21:21:23 EDT 2010


On May 5, 3:43 am, Terry Reedy <tjre... at udel.edu> wrote:
> On 5/4/2010 11:37 AM, Stefan Behnel wrote:
>
> > Barak, Ron, 04.05.2010 16:11:
> >> The XML file seems to be valid XML (all XML viewers I tried were able
> >> to read it).
>
>  From Internet Explorer:
>
> The XML page cannot be displayed
> Cannot view XML input using XSL style sheet. Please correct the error
> and then click the Refresh button, or try again later.
>
> --------------------------------------------------------------------------------
>
> An invalid character was found in text content. Error processing
> resource 'file:///C:/Documents and Settings...
>
>       <m_detail>"BROLB21
>
>
>
> > This is what xmllint gives me:
>
> > -----------------------
> > $ xmllint /home/sbehnel/tmp.xml
> > tmp.xml:6: parser error : Char 0x0 out of allowed range
> > <m_sanApiName1>"MainStorage_snap
> > ^
> > tmp.xml:6: parser error : Premature end of data in tag m_sanApiName1 line 6
> > <m_sanApiName1>"MainStorage_snap
> > ^
> > tmp.xml:6: parser error : Premature end of data in tag DbHbaGroup line 5
> > <m_sanApiName1>"MainStorage_snap
> > ^
> > tmp.xml:6: parser error : Premature end of data in tag database line 4
> > <m_sanApiName1>"MainStorage_snap
> > ^
> > -----------------------
>
> > The file contains 0-bytes - clearly not XML.
>
> IE agrees.

Look closer. IE *DOESN'T* agree. It has ignored the problem on line 6
and lurched on to the next problem (in line 11). If you edit that file
to remove the line noise in line 11, leaving the 3 cases of multiple
\x00 bytes, IE doesn't complain at all about the (invalid) \x00 bytes.



More information about the Python-list mailing list