Parsing XML with ElementTree (unicode problem?)

Thu Jul 26 10:34:25 EDT 2007

oren.tsur at gmail.com wrote:
> On Jul 26, 3:13 pm, John Machin <sjmac... at lexicon.net> wrote:
>> On Jul 26, 9:24 pm, oren.t... at gmail.com wrote:
>>
>>> OK, I solved the problem but I still don't get what went wrong.
>>> Solution - use tree builder in order to create the new xml file
>>> (previously I was  "manually" creating it).
>>> I'm still curious so I'm adding a link to a short and very simple
>>> script that gets an xml (containing non ascii chars) from the web and
>>> saves some of the elements to 2 different local xml files - one is
>>> created by XMLWriter and the other is created manually. you could see
>>> that parsing of the first local file is OK while parsing of the
>>> "manually" created xml file fails. obviously I'm doing something wrong
>>> and I'd love to learn what.
>>> the toy script:http://staff.science.uva.nl/~otsur/code/xmlConversions.py
>> Simple file comparison:
>>
>> File 1: ... Modern Church.  <p>The book ...
>> File 2: ... Modern Church.  <p>The book ...
>>
>> Firefox:
>>
>> XML Parsing Error: mismatched tag. Expected: </p>.
>> Location: file:///C:/junk/myDeVinciCode166_2.xml
>> Line Number 3, Column 1153:
>>
>> <CONTENT>The...Church.  <p>The...thrill.</CONTENT>
>> ------------------------------------------^
> 
> yup, but why does this happen - on the script side - I write the exact
> same strings, of content with supposedly, same encoding, so why the
> encoding is different?

Read the mail. It's not the encoding, it's the "<p>" which does not get
through as a tag in the first file.

Stefan