pyexpat and unicode
mallum
breakfast at 10.am
Mon Dec 17 17:10:00 EST 2001
Nope. This still breaks, with the same error;
import xml.parsers.expat
parser = xml.parsers.expat.ParserCreate(encoding='utf8')
data_uni = u"<?xml version='1.0' encoding='UTF-8'?><hello>\202</hello>"
data_uni.encode('utf8')
parser.Parse(data_uni)
Is this a Bug ?
-- mallum
on Mon, Dec 17, 2001 at 07:10:22PM +0000, python-list-admin at python.org wrote:
> mallum wrote:
> ...
> > data_uni = u"<?xml version='1.0' encoding='UTF-8' ?><hello>\202</hello>"
> > data = "<?xml version='1.0' encoding='UTF-8' ?><hello>there</hello>"
> >
> > data_uni.encode('utf8')
> >
> > parser.Parse(data)
> > parser.Parse(data_uni)
> ...
> > Does this mean Im unable to pass utf8 encoded strings to pyexpat ?
> > According to the docs it should. Can anyone spread some light on this.
>
> You can't, I believe, pass SOME strings with a certain encoding followed in
> the same parse by others with different encodings; or, as in this case,
> ones not in fact encoded (remember the call to .encode returns an encoded
> string, which you ignore -- it doesn't change data_uni, of course, as it's
> immutable, like all strings).
>
> Separate parses work fine:
>
> import xml.parsers.expat
> parser = xml.parsers.expat.ParserCreate(encoding='utf8')
>
> data_uni = u"<?xml version='1.0' encoding='UTF-8' ?><hello>\202</hello>"
> data = "<?xml version='1.0' encoding='UTF-8' ?><hello>there</hello>"
>
> denc = data_uni.encode('utf8')
>
> for thedata in data_uni, data, denc:
> parser = xml.parsers.expat.ParserCreate(encoding='utf8')
> print 'parsing', repr(thedata)
> parser.Parse(data, 1)
> print 'done'
>
>
> Alex
>
> --
> http://mail.python.org/mailman/listinfo/python-list
More information about the Python-list
mailing list