Typing system vs. Java

Markus Schaber markus at schabi.de
Fri Aug 10 19:44:13 EDT 2001


Hi,

Alex Martelli <aleaxit at yahoo.com> schrub:

>> And every function and attribute access means one pointer
>> indirection, as the offsets of the members in the array are fixed -
>> the types are statically known.
> 
> This is not sufficient -- the type must not just be statically known
> (either declared or inferred), but *non-extensible* (not 'expando' in
> Javascript terms, nonextensible in COM MIDL terms, not a Python
> object in Python terms) or else dynamic lookup (which it may be
> possible to optimize away via dispatch-caching, with some kind
> of checking of course -- just as in the general case) is still needed.
>
> A language might define rules whereby objects are "somewhat
> extensible" but never in such a way as to require dynamic lookup,
> but I suspect that would be pretty stilted, a typical case of a rule
> made for the sole purpose of easing implementers' lives shows
> through all the way up to language level and makes miserable
> the lives of language _users_ -- typical of languages not used by
> people who implement them, I suspect:-) -- in practice, we may
> as well say that the ability to use vtables implies not just type-
> knowledge (static or inferred) but also non-extensibility (either
> imposed by the language or -- in very special cases, when ALL
> relevant code can be seen -- again compiler-inferred).

When organizing a statically typed object system as I described above, 
the following things are theoretically possible at run-time:

One can exchange methods (this implies that the method-lookup-tables 
are not bound to the class descriptor, but to the object itsself)

One also can implement additional interfaces (means they can get 
additional members). This can be a bit tricky when new member-variables 
are added, as it may be needed to enlarge the object and therefore 
relocate it, means all references to the object need to be corrected. 
Aliases, Backchains and other solutions for this problem exist.

This way, it also is possible to map objects to interfaces (means, that 
methods with the same signature, but different names can be mapped to 
the same code). Imagine, you have an Bag interface. You can throw() 
objects in, and every grab() returns one object, removing it from the 
bag. Now you don't have a bag implementation, but you have a stack 
defining push() and pop(), or a fifo defining put() and get() methods. 
As the bag doesn't define any sematics on which object is returned 
first, both implementations can be mapped to bag interface. (In Python, 
you could easily do this with astack.throw = astack.push; astack.grab = 
astack.pop() and using astack as a bag from now on.)

When including the compiler into the run-time library, you can even 
create code objects out of strings inside a running program, and this 
way can create new types and implementations.

Of course, expressions, code-blocks methods and members can itsself be 
"first-class objects" with a little compiler intelligence. 

Add an understandable template-like mechanism and generic containers 
etc. in the standard library, and you can easily create a "bag of int" 
instead having to cast at every grab() from the generic bag.

This way, there is not much left you can't do without real dynamic 
lookup, and this can be done explicitly using some sort of reflection 
API.

But unfortunately, I don't know of any main-stream language that 
implements all this way.

The gain are two things: You get speed (faster lookups and possible 
optimizations), and you can concentrate type-caused run-time errors to 
specific places (the explicit casts), or even avoid them completely. 
The Timor experimental language has a nice select-case like construct 
for this situation - you can type something like (I don't remember the 
exact syntax, but it was java-like):

Typecase container {
        as stack: container.push(thing); break;
        as fifo: container.put(thing); break;
        as bag: container.throw(thing); break;
        default: //here, you can do your error-handling
}

The loss is that you have to worry about declaring everything that's 
obvious for you as a programmer. That's the reason I like python - but 
that's also the reason I am somehow afraid of it: you never really know 
what you get when working at larger projects, as documented behaviour 
is not enforced in any way.

And another loss is that some things are tricky decisions in language 
design: Can a bag of int be casted to a bag of objects? Or the other 
way round?

>> And with caching, you get in trouble when multi-threading:
>> Imagine a method that signals another thread, and this thread
>> replaces the method. But your own thread has cached the old method...
> 
> Clearly such "method modifications" must change the hashed type
> signature -- remember I've stipulated a signature check before
> the cached method is *used*.  I'd find it fun to see how a
> "statically-typed" language would cope with a method changing
> (and "in another thread" to boot?!-) 'under its nose'...?-)

Well - of course the Method currently being executed is finished (as 
the PC points into the code object), but in the lookup table, the 
pointer would be exchanged.

Of course, this (just as caching) makes code optimizing difficult, as 
the compiler is not allowed to store the method's address between 
consecutive calls (e. G. inside a loop).

>> But this isn't the main slowdown - it is the dynamic lookup for every
>> access. But when we change this, we wouldn't have python any more.
> 
> Except for some caching and possible optimization-inferences, of
> course.  For example:
> 
>     for i in xrange(1000000000):
>         a.pleep(i)
> 
> unless an a.pleep code changes the resolution of a.pleep (and I
> don't think it would seriously violate Pythonic spirit to say that
> has undefined consequences, just like today ensue from the far
> more natural temptation of modifying a sequence one is looping
> on:-), constant-expression hoisting could optimize this to:
> 
>     _temp_boundmethod = a.pleep
>     for i in xrange(1000000000):
>         _temp_boundmethod(i)

Well - I think especially in a highly dynamic language like python, I 
would _not_ like to see this optimization done by the compiler. When I 
implement a.pleep in a way that it can exchange a.pleep with another 
function, then I am not prepared to handle a further call with my old 
code.

Of course, when the programmer explicitly writes his code the second 
way, then he has to know what he does. By the way - using an apply() of 
a.pleep to the xrange may be the better optimization this way.
 
> Of course, THIS can be hand-optimized without too much
> trouble, too -- but consider:
> 
>     for i in xrange(1000000000):
>         a[i].pleep(i)
>
> Now, the Python coder is disarmed, even should he or she
> know that all the a[i]'s are homogeneous in type (well, a
> map of the unbound method on a might work, if the unbound
> method can be gotten at -- maybe type/class unification
> will eventually allow that for methods that don't belong to
> a class... here's hoping!), while dispatch caching would help
> a lot (if the a[i]'s belong to any of 3 or 4 types, in some
> disordered sequence, then extended dispatch caching with
> a cache of the last few method dispatches may be the
> only practicable technique... I'd hate to hand-code THAT
> in an application in any language!-).

Imagine that a[] is an array typed to a specific interface. Then use 
the "extended pointer" technique I described in my former message. So 
every entry in a[] internally contains a pointer to the data of a[], 
and a second pointer to the corresponding interface table. 

This way, the lookup for the method can be done just quickly.

> > > I do.  Nothing will remove the need for extensions, although 
their use
> > > may become less frequent.
> > Fully agree. The only way to get around this is to have _very good_
> > optimizing compilers, and to include every needed machine-specific
> > detail as a language feature.
> Unfeasible, since machines keep adding previously unforeseen
> 'details':-).

Not all those details are 'needed', and even programming languages tend 
to add previously unforeseen 'details' from time to time :-)

>> If you don't include assembly into the language, you always will have
>> to use extensions that don't follow the language's rules.
> And if you do, you'll never be cross-platform.  I know which
> 'evil' I'd rather suffer... multi-language programming (why
> do so many people hate it so?!) or having a supposedly high
> level language become 100% tied to a specific model and
> release of CPU...!

Well - you don't get around machine-specific libraries / modules (which 
might, of course, be dynamically selected on load), you only get around 
writing some of those using another language. But this way, you would 
get a language such insane that it would nevertheless cause me to use 
another language. :-)

I think the most important thing is a seemless integration of the 
programming lanugages. C and derivates can be integrated with almost 
everything, at the cost of lots of work for the programmer (in Python 
at least at the C programmers side). Jython, Java, Pizza etc. are much 
better integrated, because they use the same run-time structures and 
clever compilers and libraries which hide the differences.

markus
-- 
1) Customers cause problems.
2) Marketing is trying to create more customers.
Therefore:
3) Marketing is evil.  (Grand Edwards in comp.lang.python)



More information about the Python-list mailing list