[Python-Dev] bytes / unicode
P.J. Eby
pje at telecommunity.com
Sun Jun 27 19:02:28 CEST 2010
At 03:53 PM 6/27/2010 +1000, Nick Coghlan wrote:
>We could talk about this even longer, but the most effective way
>forward is going to be a patch that improves the URL parsing
>situation.
Certainly, it's the only practical solution for the immediate problems in 3.2.
I only mentioned that I "hate the idea" because I'd be more
comfortable if it was explicitly declared to be a temporary hack to
work around the absence of a string coercion protocol, due to the
moratorium on language changes.
But, since the moratorium *is* in effect, I'll try to make this my
last post on string protocols for a while... and maybe wait until
I've looked at the code (str/bytes C implementations) in more detail
and can make a more concrete proposal for what the protocol would be
and how it would work. (Not to mention closer to the end of the moratorium.)
>There are a *very small* number of APIs where it is appropriate to
>be polymorphic
This is only true if you focus exclusively on bytes vs. unicode,
rather than the general issue that it's currently impractical to pass
*any* sort of user-defined string type through code that you don't
directly control (stdlib or third-party).
>The virtues of a separate poly_str type are that:
>1. It can be simple and implemented in Python, dispatching to str or
>bytes as appropriate (probably in the strings module)
>2. No chance of impacting the performance of the core interpreter (as
>builtins are not affected)
Note that adding a string coercion protocol isn't going to change
core performance for existing cases, since any place where the
protocol would be invoked would be a code branch that either throws
an error or *already* falls back to some other protocol (e.g. the
buffer protocol).
>3. Lower impact if it turns out to have been a bad idea
How many protocols have been added that turned out to be bad
ideas? The only ones that have been removed in 3.x, IIRC, are
three-way compare, slice-specific operations, and __coerce__... and
I'm going to miss __cmp__. ;-)
However, IIUC, the reason these protocols were dropped isn't because
they were "bad ideas". Rather, they're things that can be
implemented in terms of a finer-grained protocol. i.e., if you want
__cmp__ or __getslice__ or __coerce__, you can always implement them
via a mixin that converts the newer fine-grained protocols into
invocations of the older protocol. (As I plan to do for __cmp__ in
the handful of places I use it.)
At the moment, however, this isn't possible for multi-string
operations outside of __add__/__radd__ and comparison -- the coercion
rules are hard-wired and can't be overridden by user-defined types.
More information about the Python-Dev
mailing list