[ python-Bugs-964929 ] Unicode String formatting does not correctly handle objects

SourceForge.net noreply at sourceforge.net
Fri Jul 23 18:22:40 CEST 2004


Bugs item #964929, was opened at 2004-06-02 12:48
Message generated for change (Comment added) made by lemburg
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=964929&group_id=5470

Category: Unicode
>Group: Python 2.4
>Status: Open
>Resolution: None
Priority: 5
Submitted By: Giles Antonio Radford (mewf)
Assigned to: M.-A. Lemburg (lemburg)
Summary: Unicode String formatting does not correctly handle objects

Initial Comment:
I have a problem with the way '%s' is handled in
unicode strings when formatted. The Python Language
refrence states that a unicode serialisation of an
object should be in __unicode__, and I have seen python
break down if unicode data is returned in __str__.

The problem is that there does not appear to be a way
to interpolate the results from __unicode__ within a
string:

class EuroHolder:
    def __init__(self, price):
        self._price = price
    def __str__(self):
        return "%.02f euro" % self._price
    def __unicode__(self):
        return u"%.02f\u20ac" % self._price

>>> class EuroHolder:
...     def __init__(self, price):
...         self._price = price
...     def __str__(self):
...         return "%.02f euro" % self._price
...     def __unicode__(self):
...         return u"%.02f\u20ac" % self._price
... 
>>> e = EuroHolder(123.45)
>>> str(e)
'123.45 euro'
>>> unicode(e)
u'123.45\u20ac'
>>> "%s" % e
'123.45 euro'
>>> u"%s" % e #this is wrong
u'123.45 euro'
>>> u"%s" % unicode(e) # This is silly
u'123.45\u20ac'
>>> 

The first case is wrong, as I actually could cope with
unicode data in the string I was substituting into, and
I should be able to request the unicode data be put in.

The second case is silly, as the whole point of string
substion variables such as %s, %d and %f is to remove
the need for coercion on the right of the %.

Proposed solution #1:
Make %s in unicode string substitution automatically
check __unicode__() of the rvalue before trying
__str__(). This is the most logical thing to expect of
%s, if you insist on overloading it the way it
currently does when a unicode object in the rvalue will
ensure the result is unicode.

Proposed solution #2:
Make a new string conversion operator, such as %S or %U
which will explicitly call __unicode__() on the rvalue
even if the lvalue is a non-unicode string

Solution #2 has the advantage that it does not break
any previous behaviour of %s, and also allows for
explicit conversion to unicode of 8-bits string in the
lvalue.

I prefer solution #1 as I feel that the current
operation of %s is incorrect, and it's unliekly to
break much, whereas the "advantage" of converting 8-bit
strings in the lvalue to unicode which solution #2
advocates will just lead to encoding problems and
sloppy code.

----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2004-07-23 18:22

Message:
Logged In: YES 
user_id=38388

Note that this will no go into 2.3.x since it is a new
feature. Changing the scope to Python 2.4.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2004-07-23 18:21

Message:
Logged In: YES 
user_id=38388

I've checked in the proposed solution:

Checking in Objects/unicodeobject.c;
/cvsroot/python/python/dist/src/Objects/unicodeobject.c,v 
<--  unicodeobject.c
new revision: 2.218; previous revision: 2.217


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2004-07-23 12:29

Message:
Logged In: YES 
user_id=38388

Good point.

I think the only change needed is to use PyObject_Unicode()
instead of PyObject_Str() in unicodeobject.c's
PyUnicode_Format(). This would then implement #1.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=964929&group_id=5470


More information about the Python-bugs-list mailing list