[Python-Dev] Allowing u.encode() to return non-strings

Fri Jun 18 09:03:52 EDT 2004

On Thursday 2004-06-17 16:43, Guido van Rossum wrote:

[MAL proposed that restrictions on the "encode" method should
be lifted...]

> May I make one tiny objection?  I don't know if it's enough to stop
> this (I value it at -0.5 at most), but this will make reasoning about
> types harder.  Given that approaches like StarKiller and IronPython
> are likely the best way to get near-C speed for Python, I'd like the
> standard library at least to make life easy for their approach.
> 
> The issue is that currently the type inferencer can know that the
> return type of u.encode(s) is 'unicode', assuming u's type is
> 'unicode'.

Um, you don't mean that. u"foo".encode() == "foo", of type str.

>             But with the proposed change, the return type will depend
> on the *value* of s, and I don't know how easy it is for the type
> inferencers to handle that case -- likely, a type inferencer will have
> to give up and say it returns 'object'.

When looking for near-C speed, type inferencing is most important
for a relatively small set of particularly efficiently manipulable
types: most notably, smallish integers. Being able to prove that
something is a Unicode object just isn't all that useful for
efficiency, because most of the things you can do to Unicode
objects aren't all that cheap relative to the cost of finding out
what they are. Likewise, though perhaps a bit less so, for being
able to prove that something is a string.

At least, so it seems to me. Maybe I'm wrong. I suppose the
extract-one-character operation might be used quite a bit,
and that could be cheap. But I can't help feeling that
occasions where (1) the compiler can prove that something
is a string because it comes from calling an "encode" method,
(2) it can't prove that any other way, (3) this makes an
appreciable difference to the speed of the code, and (4)
there isn't any less-rigorous (Psyco-like, say) way for
the type to be discovered and efficient code used, are
likely to be pretty rare, and in particular rare enough
that supplying some sort of optional type declaration
won't be unacceptable to users. (I bet that any version
of Python that achieves near-C speed by doing extensive
type inference will have optional type declarations.)

The above paragraph, of course, presupposes that we keep
the restriction on the return value of u.encode(s), and
start enforcing it so that the compiler can take advantage.

> (I've never liked functions whose return type depends on the value of
> an argument -- I guess my intuition has always anticipated type
> inferencing. :-)

def f(x): return x+x

has that property, even if you pretend that "+" only works
on numbers.

-- 
g