[Python-checkins] r77717 - in python/trunk: Lib/distutils/command/register.py Lib/distutils/command/upload.pyLib/distutils/dist.py Lib/distutils/tests/test_register.py Lib/distutils/tests/test_upload.py Misc/NEWS

Tarek Ziadé ziade.tarek at gmail.com
Sun Jan 24 22:50:01 CET 2010


On Sun, Jan 24, 2010 at 5:36 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> <tarek.ziade> writes:
>>
>> Author: tarek.ziade
>> Date: Sun Jan 24 01:33:32 2010
>> New Revision: 77717
>>
>> Log:
>> Fixed #7748: now upload and register commands don't need to force the
>> encoding anymore :
>> DistributionMetada returns utf8 strings
>
> I'm sorry, is this right?
> First, this looks like a backwards-incompatible change in the API.

how so ?

> Merging it
> into 2.6 could break software which used to work fine.

Can you provide an example ? Even if it's not really documented, the
DistributionMetada class in 2.7 is supposed to hold only utf-8
strings. Then are then serialized in a file (PKG-INFO) or in a HTTP
request that is pushed at PyPI.

The unicode conversion was added in the past to avoid breakages when people
used unicode strings for these fields : 'author', 'author_email',
'maintainer', 'maintainer_email', 'description', 'long_description'.
Other field are strings, and are supposed to be utf-8 to be
serializable.

People are currently doing this for instance in their setup.py file:

  setup(name='foo', author=u'rené')

and this was working with the "register" command but not the "upload" command,
because the register command was converting unicode on the fly, and
upload did not.

But those commands should not be responsible for that, they should
handle a homogeneous list of fields provided by the DistributionMetada
class and know what they get.

IOW, the get_* functions of DistributionMetada should always return
the same types for the fields, because consumers should not guess it.

Maybe a better fix could've been to forbid the usage of unicode *or*
str in the first place in setup() for these options, and restrict it
to unicode. But it is curently working for upload so we can't remove
it. (this would be in this case a backwards-incompatible change)

> Second, I don't think returning utf8-encoded strings is any better than
> returning unicode objects.

Not sure this is the best idea in 2.x. People are expecting strings as
an output. unicode is exceptional and leads to mentioned bugs =
distutils was not handling them properly. That's why I have done this
compromise.

> It's probably worse actually (especially in py3k, but
> I don't know what the behaviour is there).

This fix is only for 2.x. In python 3, it's the new str for all the
fields, Then it's serialized using bytes. What I can do in 3k is add a
control that the fields received through setup() only str values, and
raise an assert otherwise.

Tarek


More information about the Python-checkins mailing list