Parsing XML with ElementTree (unicode problem?)

oren.tsur at gmail.com oren.tsur at gmail.com
Thu Jul 26 11:05:02 EDT 2007


On Jul 26, 4:34 pm, Stefan Behnel <stefan.behnel-n05... at web.de> wrote:
> oren.t... at gmail.com wrote:
> > On Jul 26, 3:13 pm, John Machin <sjmac... at lexicon.net> wrote:
> >> On Jul 26, 9:24 pm, oren.t... at gmail.com wrote:
>
> >>> OK, I solved the problem but I still don't get what went wrong.
> >>> Solution - use tree builder in order to create the new xml file
> >>> (previously I was  "manually" creating it).
> >>> I'm still curious so I'm adding a link to a short and very simple
> >>> script that gets an xml (containing non ascii chars) from the web and
> >>> saves some of the elements to 2 different local xml files - one is
> >>> created by XMLWriter and the other is created manually. you could see
> >>> that parsing of the first local file is OK while parsing of the
> >>> "manually" created xml file fails. obviously I'm doing something wrong
> >>> and I'd love to learn what.
> >>> the toy script:http://staff.science.uva.nl/~otsur/code/xmlConversions.py
> >> Simple file comparison:
>
> >> File 1: ... Modern Church.  <p>The book ...
> >> File 2: ... Modern Church.  <p>The book ...
>
> >> Firefox:
>
> >> XML Parsing Error: mismatched tag. Expected: </p>.
> >> Location: file:///C:/junk/myDeVinciCode166_2.xml
> >> Line Number 3, Column 1153:
>
> >> <CONTENT>The...Church.  <p>The...thrill.</CONTENT>
> >> ------------------------------------------^
>
> > yup, but why does this happen - on the script side - I write the exact
> > same strings, of content with supposedly, same encoding, so why the
> > encoding is different?
>
> Read the mail. It's not the encoding, it's the "<p>" which does not get
> through as a tag in the first file.
>
> Stefan

thanks. I guess it was a dumb question after all. thanks again :)




More information about the Python-list mailing list