[Python-Dev] bytes / unicode

Sun Jun 27 05:49:11 CEST 2010

At 12:43 PM 6/27/2010 +1000, Nick Coghlan wrote:
>While full support for third party strings and
>byte sequence implementations is an interesting idea, I think it's
>overkill for the specific problem of making it easier to write
>str/bytes agnostic functions for tasks like URL parsing.

OTOH, to write your partial implementation is almost as complex - it 
still must take into account joining and formatting, and so by that 
point, you've just proposed a new protocol for coercion...  so why 
not just make the coercion protocol explicit in the first place, 
rather than hardwiring a third type's worth of special cases?

Remember, bytes and strings already have to detect mixed-type 
operations.  If there was an API for that, then the hardcoded special 
cases would just be replaced, or supplemented with type slot checks 
and calls after the special cases.

To put it another way, if you already have two types special-casing 
their interactions with each other, then rather than add a *third* 
type to that mix, maybe it's time to have a protocol instead, so that 
the types that care can do the special-casing themselves, and you 
generalize to N user types.

(Btw, those who are saying that the resulting potential for N*N 
interaction makes the feature unworkable seem to be overlooking 
metaclasses and custom numeric types -- two Python features that in 
principle have the exact same problem, when you use them beyond a 
certain scope.  At least with those features, though, you can 
generally mix your user-defined metaclasses or numeric types with the 
Python-supplied basic ones and call arbitrary Python functions on 
them, without as much heartbreak as you'll get with a from-scratch 
stringlike object.)

All that having been said, a new protocol probably falls under the 
heading of the language moratorium, unless it can be considered "new 
methods on builtins"?  (But that seems like a stretch even to me.)

I just hate the idea that functions taking strings should have to be 
*rewritten* to be explicitly type-agnostic.  It seems *so* 
un-Pythonic...  like if all the bitmasking functions you'd ever 
written using 32-bit int constants had to be rewritten just because 
we added longs to the language, and you had to upcast them to be 
compatible or something.  Sounds too much like C or Java or some 
other non-Python language, where dynamism and polymorphy are the 
special case, instead of the general rule.