Copy constructors

Tue Aug 14 13:42:56 EDT 2001

Sun, 12 Aug 2001 01:35:16 GMT, Guido van Rossum <guido at python.org> pisze:

> I guess I have a bit of a hidden agenda: Python is more dynamic than
> the language I *wanted* to design.  Some of the dynamicism was simply
> a implementation trick.  Some of the dynamicism is getting in the way
> of optimizing code, because the optimizer can never prove that certain
> variables won't be changed.  So I'm trying to look for ways that pin
> down things a bit more.  I'm making assumptions about how "typical"
> Python code uses the dynamic features, and I'm slowly trying to
> introduce restrictions in the language that make the optimizer's life
> easier without affecting "typical" code.

This will not be directly relevant for Python, although maybe somebody
will find a use of parts of this. I need some help with making things
less dynamic.

Crossposting to comp.lang.misc. I really wanted to talk with Python
people, so followups are set to comp.lang.python; feel free to change
to comp.lang.misc. I'm sorry for the off-topic. Python people just
appear to have good minds for these things.

Many years ago I wanted to design an ideal programming language.

Later I learned many more existing languages, realized how naive my
ideas were, how languages for different purposes should be different,
what diverse principles languages are based on, and that it doesn't
matter that much for practice: how important are reasons other than
technical details of the language core.

I was no longer working on my own language, and even discouraging some
people from repeating my mistakes. More important for practice and also
fun is helping teams maintaining existing language implementations,
which often drive language design of non-mainstream languages. I
adopted Haskell for the purpose of being called "my language",
and recently watched closely Python development and took part in
discussions about it.

But my old dreams are crying, wanting to become true. I couldn't
resist. Now, with different feelings, I quickly designed and
implemented a little language, only for enjoying working on it.

[...]

The language is similar in spirit to Lisp and Python, with some of ML,
Haskell and Smalltalk added. Design goal: simple core, still reasonably
convenient and readable (unlike pure lambda calculus or Forth for
example) but more idealistic than pragmatic, with interesting stuff
implemented in the language itself. Here is a description.

The main syntactic entity is an expression. An expression denotes
a function. A function takes a sequence of functions as arguments,
performs some action, and returns a value (or throws an exception
which is also a value). There are no static types but scoping is
completely static.

It follows that all arguments are passed by name (arguments are
functions themselves). Most functions however begin with evaluating
their parameters once (applying them to empty sequences) and
remembering the results. This effect is obtained by the default syntax
of function definition. It's thus equivalent to passing arguments by
reference. Since all values are immutable, the default effect is also
equivalent to passing them by value.

Values are either functions (as described above), or terms, or
boring primitives (integers and strings for now). A term has a symbol
(an unique tag for distinguishing a family of terms) and a sequence
of values.

That's all of the execution model. More sophisticated objects,
including imperative features, are expressed as functions, perhaps
builtin. Objects don't have intrinsic identity.

[...]

I have a prototype interpreter written in Haskell:
      24 lines of abstract syntax,
     191 lines of lexer,
     114 lines of parser,
      45 lines of the runtime system,
     423 lines of builtins (evolving much),
     226 lines of the interpreter of the core,
      60 lines of the main program (read-eval-print loop),
     379 lines of the standard library (evolving much),
      38 lines of the Makefile,
    ----
    1500 in total.

The standard library introduces the concept of a type. The type of an
object is used to choose implementations of generic functions applied
to it.

A type is represented as a symbol. Here is how the type of an object
is determined:
- If it's a function, it's called with one argument: the symbol Type,
  and is expected to return its type.
- If it's a term, its symbol is looked up in the dictionary of types
  of symbols.
- Types of integers and strings are fixed to Integer_t and String_t.

Moreover there is a dictionary which maps types to tuples of things
called its supertypes (a tuple is a term with the symbol being Tuple).
Perhaps they should be functions returning tuples, such that supertypes
can depend on the considered object (e.g. some proxy would have
type Proxy itself but its supertypes would include the type of the
wrapped object).

A generic function is created from a dictionary which maps types to
functions. When such function is called, it determines the type of
its first argument and looks up the implementation in the dictionary.
If it's not found for the given type, its supertypes are examined
recursively (depth first search). If it's still not found and the
first argument is a function or term, a generic type Function_t or
Term_t is tried. Finally Object_t is tried; an entry at Object_t is
the default implementation for all objects not having their own.

I think this is similar to how CLOS expresses methods. Unfortunately
I have almost no experience in Lisp and I never read complete Lisp
references. But I'm sure that my language has much in common with Lisp.

I had a chicken-and-egg problem here. Hashing and equality are good
candidates for generic functions. They are needed for implementing
generic dictionaries. In order to look up types in the dictionary,
it must be determined how types are hashed, so the hashing function
should be extracted from the relevant dictionary at Type_t. How to do
it? The type of Type_t is Type_t, so one has to determine how to hash
Type_t by looking up a dictionary at Type_t, oops! Even to determine
that the type of Integer_t is Type_t one has to look up Integer_t in
a dictionary, which needs its type to hash it...

I solved this by using dictionaries specialized for symbols (they
use no generic functions) and declaring that types must be symbols
and not arbitrary objects (unless they are not used by the standard
overloading machinery, or at least a different kind of dictionary is
used for their interfaces).

Example, how equality is defined:

prim_equal <- :(==)
# Save the builtin definition of (==) for using it below (the builtin
# version compares only symbols, integers and string).

equal_dict <- const dict_of_symbols
(==) <- method equal_dict

# A generic indexing operator is not defined yet at this place and
# operator (.) works only for terms now, so I'm using an ugly direct
# function call to set the entry in the dictionary (which is
# implemented as a function):
equal_dict! Set Term_t [x y]
    # [x y] starts a lambda abstraction.
    is y Term_t & prim_equal (symbol x) (symbol y) &
        len <- const (length x)
        (len == length y) &
            check i = (i >= len) | (((x.i) == (y.i)) & check (i+1))
            check 0

equal_dict! Set Integer_t: prim_equal
equal_dict! Set String_t: prim_equal

End of example. Further types may insert additional entries.

The overloading machinery is similar to Haskell classes. Not only
further types may insert additional entries, but existing types may
be put into new dictionaries of new overloaded operations. Thus an
interface of a type is extensible.

There is a problem I don't know how to solve: I want the language
to have a potential of efficient implementation. It works only to
some extent...

Static binding means that identifiers like length and (&) refer to
statically known functions. They are not looked up in a dictionary at
runtime. An optimizing compiler can see their definitions and inline
their calls. The fact that my prototype implementation looks them up
by names at runtime is irrelevant :-)

It's important that a compiler can be sure that
    x <- var 0
refers to the standard function called var which creates a variable, so
also uses of x (getting and setting its value) can be inlined as well.
It's also important to really pass by value what would be evaluated
anyway.

Note that evaluating x as an expression reads the current value of this
variable. Every identifier is bound to a function, and mutable variable
is implemented as a function (a constant too). The syntax of calling
a function bound to an identifier doesn't need parentheses: just the
name and parametres separated by spaces. The <- construct evaluates
the rhs which must return a function which is bound to the identifier.

One can't assign to length for example. This function is not a
variable, won't understand the assignment message, and certainly won't
ever change its internal state (it doesn't have any): it will always do
the same thing. That's why calls to length can in principle be inlined.

Unless the standard library defines a generic length which will be
used instead of the builtin one - and it will surely do! The problem
is with overloading. The dispatching code could be statically analysed
and inlined, but not the result of the dispatch.

There is in practice no way a compiler can infer that (==) defined
as above, applied to integers, really uses the above definition. At
any time somebody could replace the equal_dict.Integer_t entry with
something entirely different.

Even if dict_of_symbols disallowed overwriting existing values and
a compiler would be taught about it, it doesn't work for example in
the following case: equality on Tuple_t uses the generic definition
for Term_t, but someone later inserts an explicit definition for
Tuple_t. Or adds a supertype of tuples and an implementation of length
for them which is searched before Term_t. Or whatever.

I don't know how to solve it. It seems a silly problem, but the
prototype implementation is really slow. Even though I don't intend
to let this language have serious uses, I'm worried that I can't
design it well. Having a few mandatory dictionary lookups on using
(==) on integers is not a good option.

The intent of a code calling (==) on integers is to use the concrete
standard implementation! The intent is to not allow somebody define how
integers are compared, and the compiler should in theory take advantage
of that. I already captured the intent for non-overloaded functions.

It could be solved in an ad-hoc way by special-casing important
operations, but I want a general solution. And a solution which would
not complicate the core language.

The core and syntax are really simple. No reserved words at all and
only the following symbols have special meaning (besides literals
and comments):

    ;      sequencing, separation of declarations
    =      recursive binding
    <-     side-effecting binding
    :      reference, i.e. a special case of lambda abstraction
    [ ]    lambda abstraction
    ,      insert all values of the tuple as arguments
    !      get the function to which it evaluates
    ( )    grouping

Even the while loop is defined in the library. The syntax feels
a bit Pythonic, thanks to using layout:

    i <- var 0
    while (i < 10):
        print i ", "
        i :+ 1

There are two arguments of while here: a condition (which is evaluated
multiple times) and a lambda abstraction (which is evaluated once,
but the function it evaluates to is the body of the loop and is called
on each iteration).

Of course :+ is also just an operator, defined thus (after + is
defined to do the right thing):
    x! :+ y = x! := (x! + y)
The exclamation mark turns off the evaluation of the argument on entry,
such that the identifier x returns a function which was denoted by the
expression passed as the first argument, instead of its result. Passing
variables by reference is thus unified with delayed evaluation.

Parentheses are necessary here because there is no operator precedence
and operators associate in the opposite direction. You could save the
parentheses by puting the rhs indented in the lines which follow.

Even assignment is defined in the standard library:
    x! := y = x! Assign y
Finally the Assign symbol is born as a builtin because it's referred to
by another builtin, namely var, and var is a builtin because creation
of new mutable references can't be defined in terms of something
more primitive.

The core is so simple that I don't want to introduce a builtin concept
of polymorphic dispatching. The whole fun is bootstrapping something
powerful from small. But I still want it to be potentially efficient,
so the compiler can often infer what function to call. Help!

-- 
 __("<  Marcin Kowalczyk * qrczak at knm.org.pl http://qrczak.ids.net.pl/
 \__/
  ^^                      SYGNATURA ZASTĘPCZA
QRCZAK