inconvenient unicode conversion of non-string arguments

Wed Dec 13 05:02:30 EST 2006

Holger Joukl wrote:
> Hi there,
>
> I consider the behaviour of unicode() inconvenient wrt to conversion of
> non-string
> arguments.
> While you can do:
>
> >>> unicode(17.3)
> u'17.3'
>
> you cannot do:
>
> >>> unicode(17.3, 'ISO-8859-1', 'replace')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: coercing to Unicode: need string or buffer, float found
> >>>
>
> This is somehow annoying when you want to convert a mixed-type argument
> list
> to unicode strings, e.g. for a logging system (that's where it bit me) and
> want to make sure that possible raw string arguments are also converted to
> unicode without errors (although by force).
> Especially as this is a performance-critical part in my application so I
> really
> do not like to wrap unicode() into some custom tounicode() function that
> handles
> such cases by distinction of argument types.
>
> Any reason why unicode() with a non-string argument should not allow the
> encoding and errors arguments?

There is reason: encoding is a property of bytes, it is not applicable
to other objects.

> Or some good solution to work around my problem?

Do not put undecoded bytes in a mixed-type argument list. A rule of
thumb working with unicode: decode as soon as possible, encode as late
as possible.

  -- Leo