[Python-Dev] Backporting PEP 3101 to 2.6

Guido van Rossum guido at python.org
Thu Jan 10 19:08:58 CET 2008


On Jan 10, 2008 9:57 AM, Eric Smith <eric+python-dev at trueblade.com> wrote:
> Eric Smith wrote:
> > (I'm posting to python-dev, because this isn't strictly 3.0 related.
> > Hopefully most people read it in addition to python-3000).
> >
> > I'm working on backporting the changes I made for PEP 3101 (Advanced
> > String Formatting) to the trunk, in order to meet the pre-PyCon release
> > date for 2.6a1.
> >
> > I have a few questions about how I should handle str/unicode.  3.0 was
> > pretty easy, because everything was unicode.
> >
> > 1: How should the builtin format() work?  It takes 2 parameters, an
> > object o and a string s, and returns o.__format__(s).  If s is None, it
> > returns o.__format__(empty_string).  In 3.0, the empty string is of
> > course unicode.  For 2.6, should I use u'' or ''?
>
> I just re-read PEP 3101, and it doesn't mention this behavior with None.
>   The way the code actually works is that the specifier is optional, and
> if it isn't present then it defaults to an empty string.  This behavior
> isn't mentioned in the PEP, either.
>
> This feature came from a request from Talin[0].  We should either add
> this to the PEP (and docs), or remove it.  If we document it, it should
> mention the 2.x behavior (as other places in the PEP do).  If we removed
> it, it would remove the one place in the backport that's not just hard,
> but ambiguous.  I'd just as soon see the feature go away, myself.

IIUC, the 's' argument is the format specifier. Format specifiers are
written in a very conservative character set, so I'm not sure it
matters. Or are you assuming that the *type* of 's' also determines
the type of the output?

I may be in the minority here, but I think I like having a default for
's' (as implemented -- the PEP ought to be updated) and I also think
it should default to an 8-bit string, assuming you support 8-bit
strings at all -- after all in 2.x 8-bit strings are the default
string type (as reflected by their name, 'str').

> > 3: Every overridden __format__() method is going to have to check for
> > string or unicode, just like object.__format() does, and return either a
> > string or unicode object, appropriately.  I don't see any way around
> > this, but I'd like to hear any thoughts.  I guess there aren't all that
> > many __format__ methods that will be implemented, so this might not be a
> > big burden.  I'll of course implement the built in ones.
>
> The PEP actually mentions that this is how 2.x will have to work.  So
> I'll go ahead and implement it that way, on the assumption that getting
> string support into 2.6 is desirable.

I think it is. (But then I still live in a predominantly ASCII world.  :-)

For data types whose output uses only ASCII, would it be acceptable if
they always returned an 8-bit string and left it up to the caller to
convert it to Unicode? This would apply to all numeric types. (The
date/time types have a strftime() style API which means the user must
be able to specifiy Unicode.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list