Unicode and exception strings

Terry Carroll carroll at tjc.com
Tue Jan 13 20:32:36 EST 2004


On 12 Jan 2004 08:41:43 +0100, Rune Froysa <rune.froysa at usit.uio.no>
wrote:

>Terry Carroll <carroll at tjc.com> writes:
>
>> On 09 Jan 2004 13:18:39 +0100, Rune Froysa <rune.froysa at usit.uio.no>
>> wrote:
>> 
>> >Assuming an exception like:
>> >
>> >  x = ValueError(u'\xf8')
>> >
>> >AFAIK the common way to get a string representation of the exception
>> >as a message is to simply cast it to a string: str(x).  This will
>> >result in an "UnicodeError: ASCII encoding error: ordinal not in
>> >range(128)".
>> >
>> >The common way to fix this is with something like
>> >u'\xf8'.encode("ascii", 'replace').  However I can't find any way to
>> >tell ValueErrors __str__ method which encoding to use.
>> 
>> Rune, I'm not understanding what your problem is.
>> 
>> Is there any reason you're not using, for example, just repr(u'\xf8')?
>
>The problem is that I have little control over the message string that
>is passed to ValueError().  All my program knows is that it has caught
>one such error, and that its message string is in unicode format.  I
>need to access the message string (for logging etc.).
>
>>          _display_text = _display_text + "%s\n" % line.decode('utf-8'))
>
>This does not work, as I'm unable to get at the 'line', which is
>stored internally in the ValueError class (and generated by its __str_
>method).

You should be able to get at it via x.args[0]:

>>> x = ValueError(u'\xf8')
>>> x.args[0]
u'\xf8'

The only thing is, what to do with it once you get there.  I don't think
0xF8 is a valid unicode encoding on its own.  IIRC, it's part of a
multibyte character.

You can try to extract it as above, and then decode it with the codecs
module, but if it's only the first byte, it won't decode correctly:

>>> import codecs
>>> d = codecs.getdecoder('utf-8')
>>> x.args[0]
u'\xf8'
>>> d.decode(x.args[0])
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: 'builtin_function_or_method' object has no attribute
'decode'
>>>

But, still, if all you want is to have *something* to print out explaining
the exception, you can use repr():

>>> repr(x.args[0])
"u'\\xf8'"
>>>

Is this helping any, or am I just flailing around?



More information about the Python-list mailing list