unicode in exception traceback
Peter Otten
__peter__ at web.de
Thu Apr 3 05:56:55 EDT 2008
WaterWalk wrote:
> Hello. I just found on Windows when an exception is raised and
> traceback info is printed on STDERR, all the characters printed are
> just plain ASCII. Take the unicode character u'\u4e00' for example. If
> I write:
>
> print u'\u4e00'
>
> If the system locale is "PRC China", then this statement will print
> this character as a single Chinese character.
>
> But if i write: assert u'\u4e00' == 1
>
> An AssertionError will be raised and traceback info will be put to
> STDERR, while this time, u'\u4e00' will simply be printed just as
> u'\u4e00', several ASCII characters instead of one single Chinese
> character. I use the coding directive commen(# -*- coding: utf-8 -*-)t
> on the first line of Python source file and also save it in utf-8
> format, but the problem remains.
>
> What's worse, if i directly write Chinese characters in a unicode
> string, when the traceback info is printed, they'll appear in a non-
> readable way, that is, they show themselves as something else. It's
> like printing something DBCS characters when the locale is incorrect.
>
> I think this problem isn't unique. When using some other East-Asia
> characters, the same problem may recur.
>
> Is there any workaround to it?
Pass a byte string but make some effort to use the right encoding:
>>> assert False, u"\u4e00".encode(sys.stdout.encoding or "ascii", "xmlcharrefreplace")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError: 一
You might be able to do this in the except hook:
$ cat unicode_exception_message.py
import sys
def eh(etype, exc, tb, original_excepthook=sys.excepthook):
message = exc.args[0]
if isinstance(message, unicode):
exc.args = (message.encode(sys.stderr.encoding or "ascii", "xmlcharrefreplace"),) + exc.args[1:]
return original_excepthook(etype, exc, tb)
sys.excepthook = eh
assert False, u"\u4e00"
$ python unicode_exception_message.py
Traceback (most recent call last):
File "unicode_exception_message.py", line 11, in <module>
assert False, u"\u4e00"
AssertionError: 一
If python cannot figure out the encoding this falls back to ascii with
xml charrefs:
$ python unicode_exception_message.py 2>tmp.txt
$ cat tmp.txt
Traceback (most recent call last):
File "unicode_exception_message.py", line 11, in <module>
assert False, u"\u4e00"
AssertionError: 一
Note that I've not done any tests; e.g. if there are exceptions with
immutable .args the except hook itself will fail.
Peter
More information about the Python-list
mailing list