Python 3.0 crashes displaying Unicode at interactive prompt

John Machin sjmachin at lexicon.net
Sat Dec 13 17:09:04 EST 2008


On Dec 14, 8:07 am, "Chris Rebert" <c... at rebertia.com> wrote:
> On Sat, Dec 13, 2008 at 12:28 PM, John Machin <sjmac... at lexicon.net> wrote:
>
> > Python 2.6.1 (r261:67517, Dec  4 2008, 16:51:00) [MSC v.1500 32 bit
> > (Intel)] on win32
> > Type "help", "copyright", "credits" or "license" for more information.
> >>>> x = u'\u9876'
> >>>> x
> > u'\u9876'
>
> > # As expected
>
> > Python 3.0 (r30:67507, Dec  3 2008, 20:14:27) [MSC v.1500 32 bit
> > (Intel)] on win 32
> > Type "help", "copyright", "credits" or "license" for more information.
> >>>> x = '\u9876'
> >>>> x
> > Traceback (most recent call last):
> >  File "<stdin>", line 1, in <module>
> >  File "C:\python30\lib\io.py", line 1491, in write
> >    b = encoder.encode(s)
> >  File "C:\python30\lib\encodings\cp850.py", line 19, in encode
> >    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> > UnicodeEncodeError: 'charmap' codec can't encode character '\u9876' in
> > position
> > 1: character maps to <undefined>
>
> > # *NOT* as expected (by me, that is)
>
> > Is this the intended outcome?
>
> When Python tries to display the character, it must first encode it
> because IO is done in bytes, not Unicode codepoints. When it tries to
> encode it in CP850 (apparently your system's default encoding judging
> by the traceback), it unsurprisingly fails (CP850 is an old Western
> Europe codec, which obviously can't encode an Asian character like the
> one in question). To signal that failure, it raises an exception, thus
> the error you see.
> This is intended behavior.

I see. That means that the behaviour in Python 1.6 to 2.6 (i.e.
encoding the text using the repr() function (as then defined) was not
intended behaviour?

> Either change your default system/terminal
> encoding to one that can handle such characters or explicitly encode
> the string and use one of the provided options for dealing with
> unencodable characters.

You are missing the point. I don't care about the visual
representation. What I care about is an unambiguous representation
that can be used when communicating about problems across cultures/
networks/mail-clients/news-readers ... the sort of problems that are
initially advised as "I got this UnicodeEncodeError" and accompanied
by no data or garbled data.

> Also, please don't call it a "crash" as that's very misleading. The
> Python interpreter didn't dump core, an exception was merely thrown.

"spew nonsense on the screen and then stop" is about as useful and as
astonishing as "dump core".

core? You mean like ferrite doughnuts on a wire trellis? I thought
that went out of fashion before cp850 was invented :-)




More information about the Python-list mailing list