[Python-Dev] Divorcing str and unicode (no more implicit conversions).

Phillip J. Eby pje at telecommunity.com
Mon Oct 24 04:23:40 CEST 2005


At 06:06 PM 10/23/2005 -0700, Guido van Rossum wrote:
>Folks, please focus on what Python 3000 should do.
>
>I'm thinking about making all character strings Unicode (possibly with
>different internal representations a la NSString in Apple's Objective
>C) and introduce a separate mutable bytes array data type. But I could
>use some validation or feedback on this idea from actual
>practitioners.

+1.  Chandler has been going through quite an upheaval to get its unicode 
handling together.  Having a bytes type would be great, as long as there 
was support for files and sockets to produce bytes instead of strings 
(unless an encoding was specified).

I'm tempted to say it would be even better if there was a command line 
option that could be used to force all binary opens to result in bytes, and 
require all text opens to specify an encoding.  The Chandler i18n project 
lead would jump for joy if we had a way to keep "legacy" strings out of the 
system, apart from ASCII string constants found in code.

It would then be okay not to drop support for the implicit conversions; if 
you can't get strings on input, then conversion's not really an issue.

Anyway, I think all of the things I'd like to see can be done without 
breakage in 2.5.  For Chandler at least, we'd be willing to go with a 
command-line option that's more strict, in order to be able to ensure that 
plugin developers can't accidentally put 8-bit strings in somewhere, just 
by opening a file.



More information about the Python-Dev mailing list