Mysterious xml.sax Encoding Exception

JKPeck JKPeck at gmail.com
Tue Feb 5 10:41:41 EST 2008


On Feb 4, 4:09 pm, John Machin <sjmac... at lexicon.net> wrote:
> On Feb 5, 9:02 am, JKPeck <JKP... at gmail.com> wrote:
>
>
>
> > On Feb 2, 12:56 am, Jeroen Ruigrok van der Werven <asmo... at in-
>
> > nomine.org> wrote:
> > > -On [20080201 19:06], JKPeck (JKP... at gmail.com) wrote:
>
> > > >In both of these cases, there are only plain, 7-bit ascii characters
> > > >in the xml, and it really is valid utf-16 as far as I can tell.
>
> > > Did you mean to say that the only characters they used in the UTF-16 encoded
> > > file are characters from the Basic Latin Unicode block?
>
> > It appears that the root cause of this problem is indeed passing a
> > Unicode XML string to xml.sax.parseString with an encoding declaration
> > in the XML of utf-16.  This works with the standard distribution on
> > Windows.
>
> It did NOT work for me with the standard 2.5.1 Windows distribution --
> see the code + output that I posted.
>
> >  It does not work with ActiveState on Windows even though
> > both distributions report
> > 64K for sys.maxunicode.
>
> > So I don't know why the results are different, but the problem is
> > solved by encoding the Unicode string into utf-16 before passing it to
> > the parser.

Interesting.  In the course of installing and testing with
ActiveState, I upgraded from the standard distribution 2.5.0 to
2.5.1.  The former worked; the latter does not (with the original
code).  So that ..1 seems to matter here, and that probably accounts
for why ActiveState raised the exception and the standard 2.5.0 did
not.

-Jon



More information about the Python-list mailing list