inconvenient unicode conversion of non-string arguments

Wed Dec 13 05:55:44 EST 2006

Holger Joukl wrote:
> python-list-bounces+holger.joukl=lbbw.de at python.org schrieb am 13.12.2006
> 11:02:30:
>
> >
> > Holger Joukl wrote:
> > > Hi there,
> > >
> > > I consider the behaviour of unicode() inconvenient wrt to conversion of
> > > non-string
> > > arguments.
> > > While you can do:
> > >
> > > >>> unicode(17.3)
> > > u'17.3'
> > >
> > > you cannot do:
> > >
> > > >>> unicode(17.3, 'ISO-8859-1', 'replace')
> > > Traceback (most recent call last):
> > >   File "<stdin>", line 1, in ?
> > > TypeError: coercing to Unicode: need string or buffer, float found
> > > >>>
> > > [...]
> > > Any reason why unicode() with a non-string argument should not allow
> the
> > > encoding and errors arguments?
> >
> > There is reason: encoding is a property of bytes, it is not applicable
> > to other objects.
>
> Ok, but I still don't see why these arguments shouldn't simply be silently
> ignored
> for non-string arguments.

That's rather bizzare and sloppy approach. Should

unicode(17.3, 'just-having-fun', 'I-do-not-like-errors')
unicode(17.3, 'sdlfkj', 'ewrlkj', 'eoirj', 'sdflkj')

work?

> > > Or some good solution to work around my problem?
> >
> > Do not put undecoded bytes in a mixed-type argument list. A rule of
> > thumb working with unicode: decode as soon as possible, encode as late
> > as possible.
>
> It's not always that easy when you deal with a tree data structure with the
> tree elements containing different data types and your user may decide to
> output
> root.element.subelement.whateverData.
> I have the problems in a logging mechanism, and it would vanish if
> unicode(<non-string>, encoding, errors) would work and just ignore the
> obsolete
> arguments.

I don't really see from your example what stops you from putting
unicode instead of bytes into your tree, but I can believe some
libraries can cause some extra work. That's the problem with libraries,
not with builtin function unicode(). Would you be happy if floating
point value 17.3 would be stored as 8 bytes in your tree? After all,
that is how 17.3 is actually represented in computer memory. Same story
with unicode, if some library gives you raw bytes *you* have to do
extra work later.

  -- Leo