PEP: Generalised String Coercion

Mon Aug 22 05:55:02 EDT 2005

hi,

i guess that anyone reading this pep will agree that
*something* must be done to the state of unicode affairs
in python. there are zillions of modules out there that
have str() scattered all over the place, and they all
*break* on the first mention of düsseldorf...

i'm not quite sure myself how to evolve python to make
it grow from unicode-enabled to unicode-perfect, so for
me some discussion would be a good thing. only two
micro-remarks to the pep as it stands:

1) i dislike the naming of the function ``text()`` --
i´ve been using the word 'text' for a long time to mean
'some appropriate representation of character data',
i.e. mostly something that would pass ::

    assert isinstance(x,basestring)

i feel this is a fairly common way of defining the term,
so to me a function `` text(x)`` should really

*   return its argument unaltered if it passes
    ``isinstance(x,basestring)``,

*   try to return spefically a unicode object (by using
    the ``x.__unicode__()`` method, where available)

*   or return an 8bit-string (from ``x.__repr__()`` or
    ``x.__str__()``)

as discussed later on in the pep, it is conceivable to
assign the functionality of the ``text()`` function of
the pep to ``basestring`` -- that would make perfect
sense to me (not sure whether that stands scrutiny in
the big picture, tho).

2) really minor: somewhere near the beginning it says ::

    def text(obj): return '%s' % obj

and the claim is that this "behaves as desired" except
for unicode-issues, which is incorrect. the second line
must read ::

    return '%s' % ( obj, )

or else it will fail if ``obj`` is a tuple that is not
of length one.

cheers,

_wolf