[Python-Dev] bytes / unicode

Nick Coghlan ncoghlan at gmail.com
Sun Jun 27 07:53:59 CEST 2010


On Sun, Jun 27, 2010 at 1:49 PM, P.J. Eby <pje at telecommunity.com> wrote:
> I just hate the idea that functions taking strings should have to be
> *rewritten* to be explicitly type-agnostic.  It seems *so* un-Pythonic...
>  like if all the bitmasking functions you'd ever written using 32-bit int
> constants had to be rewritten just because we added longs to the language,
> and you had to upcast them to be compatible or something.  Sounds too much
> like C or Java or some other non-Python language, where dynamism and
> polymorphy are the special case, instead of the general rule.

The difference is that we have three classes of algorithm here:
- those that work only on octet sequences
- those that work only on character sequences
- those that can work on either

Python 2 lumped all 3 classes of algorithm together through the
multi-purpose 8-bit str type. The unicode type provided some scope to
separate out the second category, but the divisions were rather
blurry.

Python 3 forces the first two to be separated by using either octets
(bytes/bytearray) or characters (str). There are a *very small* number
of APIs where it is appropriate to be polymorphic, but this is
currently difficult due to the need to supply literals of the
appropriate type for the objects being operated on.

This isn't ever going to happen automagically due to the need to
explicitly provide two literals (one for octet sequences, one for
character sequences).

The virtues of a separate poly_str type are that:
1. It can be simple and implemented in Python, dispatching to str or
bytes as appropriate (probably in the strings module)
2. No chance of impacting the performance of the core interpreter (as
builtins are not affected)
3. Lower impact if it turns out to have been a bad idea

We could talk about this even longer, but the most effective way
forward is going to be a patch that improves the URL parsing
situation.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list