unicode in exception traceback

Peter Otten __peter__ at web.de
Thu Apr 3 05:56:55 EDT 2008


WaterWalk wrote:

> Hello. I just found on Windows when an exception is raised and
> traceback info is printed on STDERR, all the characters printed are
> just plain ASCII. Take the unicode character u'\u4e00' for example. If
> I write:
> 
> print u'\u4e00'
> 
> If the system locale is "PRC China", then this statement will print
> this character as a single Chinese character.
> 
> But if i write: assert u'\u4e00' == 1
> 
> An AssertionError will be raised and traceback info will be put to
> STDERR, while this time, u'\u4e00' will simply be printed just as
> u'\u4e00', several ASCII characters instead of one single Chinese
> character. I use the coding directive commen(# -*- coding: utf-8 -*-)t
> on the first line of Python source file and also save it in utf-8
> format, but the problem remains.
> 
> What's worse, if i directly write Chinese characters in a unicode
> string, when the traceback info is printed, they'll appear in a non-
> readable way, that is, they show themselves as something else. It's
> like printing something DBCS characters when the locale is incorrect.
> 
> I think this problem isn't unique. When using some other East-Asia
> characters, the same problem may recur.
> 
> Is there any workaround to it?

Pass a byte string but make some effort to use the right encoding:

>>> assert False, u"\u4e00".encode(sys.stdout.encoding or "ascii", "xmlcharrefreplace")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError: 一

You might be able to do this in the except hook:

$ cat unicode_exception_message.py
import sys

def eh(etype, exc, tb, original_excepthook=sys.excepthook):
    message = exc.args[0]
    if isinstance(message, unicode):
        exc.args = (message.encode(sys.stderr.encoding or "ascii", "xmlcharrefreplace"),) + exc.args[1:]
    return original_excepthook(etype, exc, tb)

sys.excepthook = eh

assert False, u"\u4e00"

$ python unicode_exception_message.py
Traceback (most recent call last):
  File "unicode_exception_message.py", line 11, in <module>
    assert False, u"\u4e00"
AssertionError: 一

If python cannot figure out the encoding this falls back to ascii with 
xml charrefs:

$ python unicode_exception_message.py 2>tmp.txt
$ cat tmp.txt
Traceback (most recent call last):
  File "unicode_exception_message.py", line 11, in <module>
    assert False, u"\u4e00"
AssertionError: 一

Note that I've not done any tests; e.g. if there are exceptions with 
immutable .args the except hook itself will fail.

Peter



More information about the Python-list mailing list