Mysterious xml.sax Encoding Exception

JKPeck JKPeck at gmail.com
Mon Feb 4 17:02:09 EST 2008


On Feb 2, 12:56 am, Jeroen Ruigrok van der Werven <asmo... at in-
nomine.org> wrote:
> -On [20080201 19:06], JKPeck (JKP... at gmail.com) wrote:
>
> >In both of these cases, there are only plain, 7-bit ascii characters
> >in the xml, and it really is valid utf-16 as far as I can tell.
>
> Did you mean to say that the only characters they used in the UTF-16 encoded
> file are characters from the Basic Latin Unicode block?
>
> --
> Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
> イェルーン ラウフロック ヴァン デル ウェルヴェンhttp://www.in-nomine.org/|http://www.rangaku.org/
> We have met the enemy and they are ours...

It appears that the root cause of this problem is indeed passing a
Unicode XML string to xml.sax.parseString with an encoding declaration
in the XML of utf-16.  This works with the standard distribution on
Windows.  It does not work with ActiveState on Windows even though
both distributions report
64K for sys.maxunicode.

So I don't know why the results are different, but the problem is
solved by encoding the Unicode string into utf-16 before passing it to
the parser.

Thanks to all for helping to track this down.

Regards,
Jon Peck



More information about the Python-list mailing list