RFC: Viper: yet another python implementation

John Max Skaller skaller at maxtal.com.au
Sun Aug 15 13:18:05 EDT 1999


On 11 Aug 1999 18:23:06 +0100, Michael Hudson <mwh21 at cam.ac.uk> wrote:

>skaller at maxtal.com.au (John (Max) Skaller) writes:

>I'm interested (do you have any code yet?

	Not for the compiler: I'm implementing
the interpreter first. [I have done some work on the
type inference algorithm, to make sure it is
feasible]

>) but I have a few questions:
>
>> The interpreter aims to be compatible with 'pure python':
>> python not using external C modules, and not fiddling
>> implementation details too much. Some of the fiddles
>> will work, and some will not. 
>
>I presume fiddling with sys.exc_traceback.f_back.f_locals is out?

	At present sys.exc_info() will return a tuple in which
the third element is a traceback object. At present, that
traceback object doesn't match the Python one --
it just has a filename and line number in it, to allow the
traceback to be printed.

	However, I will try to make everything
as 'python compatible' as possible, at least in the
interpreter. In the compiler, a few things that
work in the interpreter may fail. However,
another possible result is simply to make the
compiled code more inefficient, rather than it
failing. For example, 'exec' must still work.

	[Interscript uses exec, so I have to support it ;-]

>> Compatibility:
>> 	2) rebinding module variables after importing is not permitted
>
>Does this mean the following is illegal:
>
>import foo
>foo.bar = 1

	Technically yes, according to what I said.
But actually, something less stringent will be required.
What will happen is that the compiler will begin
by executing the interpreter on all modules the main
program imports.

	During that phase of the compilation
process, assignment to module variables outside
the module will be OK.

	AFTER all the modules are imported,
it may not be OK because then I may inline the 
value, which means that an assignemt wouldn't
be honoured. However, it isn't clear that I cannot
simply detect such direct assignments and force
the system to honour the current value,
indeed, this may be necessary in some special
cases such as assigning to 

	sys.stdin

which is a necessary evil for compatibility.

	It may end up that the restriction is
that you cannot do:

	exec "foo.bar = 1"

because that cannot be detected easily by 
static analysis.

>But can foo then have a method set_bar? I guess that would be easier
>to cope with for the analyser.

	Yes. It is OK to mutate objects, as opposed
to rebinding variables: mutation has to be supported.
You may think these things are equivalent. Indeed,
they are in interpreted python. However, the restriction
leads to a performance gain; without some restrictions,
compiled python will not execute faster than the bytecode
interpreter.

	Indeed, the existing python bytecode compiler
makes such restrictions itself [on class __getattr_ 
and __set_attr__ methods, which are cached for
efficiency, as I understand it]

>> 	3) None is a keyword.
>> 	4) range is a keyword
>
>What about other builtins? 

	Yes.

>Can't your static analysis find rebindings
>of builtin names (including range) rather easily? - particularly as
>the don't happen very often.

	I hadn't considered that, but the answer is probably
'yes', provided they don't occur in 'exec's.

>> 	5) What restrictions can you live with?
>
>One thiung I occasionally find very handy is the ability to assign
>methods to classes (or more usually within classes).

	OK. I would like to do what Guido has done in the
optimisation of functions [fast lookup for statically known variables,
plus a dictionary to support others]: provide an optimised structure
where possible, and then provide a dynamic one
for the other cases.  That is, the ideal would be to provide
a 'fallback' to the interpreter.

>> Extensions:
>
>> 	2) optional values in dictionaries (Defaults to None)
>
>Now this one I don't get; do you mean dict[] has dict.get like
>semantics?

	No, you can just write:

	d = {1,2,3}

and it is OK and means:

	d = {1:None, 2:None, 3:None}

>> 	3) 'in' applies to dictionaries 'as if a sequence of keys were used'
>
>Seems unecessary to me. Maybe that's just me.

	The idea is that:

	x in {1,2,3}

means 'x ==1 or x == 2 or x ==3', that is, the construction 

	{1,2,3}

can be used like a set (without introducing a new data structure}.

>> 	5) optionally typed function parameters
>
>Out of curiosity, what syntax do use for this? 

	My first idea was the obvious:

	def f(a:int): ..

or perhaps

	def f(a: IntType): ...

and the interpreter semantics would be 

	if type(a) is not InType: raise TypeError, "Int expected for arg1 of f"

However, in the compiled version, the semantic might be to generate a compiler
error: that is, the following might not work:

	try: f(1.0)
	except TypeError: pass # just skip the call


>I presume if a
>parameter is typed as `Foo' then a value of type `Bar' where Bar
>derives from Foo is allowable.

	I guess that would make sense (the extension isn't
implemented yet -- not a lot of point appealing for comments
_after_ implementing the extension, only to find someone
has a better idea)

>> 	6) what do you want?
>
>Can you derive from basic types? That might make some stuff harder.

	Not at present. Can you give an example where
it is useful? [It can be made to work I think: is it useful?]

>> 	It is difficult to optimise individual
>> modules. It is a different story to optimise
>> a whole program, where _every_ call
>> to a function can be traced. Of course,
>> this may involve restricting what 'exec' can do,
>
>Like banning it entirely, I suspect.

	No. I cannot do that because the program
that I want to compile, interscript, uses exec,
and depends on it to execute client script.
This is the major feature of interscript, so exec
has to work (at least with restrictions).

	That means the compiled code has to
contain a full run time system (unless it can be
detected that it isn't required).

>> and it may involve other restrictions
>> (such as not  changing any module bindings
>> after loading).
>
>Is that really that much of an issue? I'd have thought that they'd be
>fairly easy to spot. 

	Several people have said that, and may be right.
I'll examine this issue, since it would provide extra compatibility
to permit it.

	As an example of why rebinding should be supported:

	_open = open
	opened_files = []
	def open(f,m): 
		opened_files.append(f)
		_open(f,m)

This code just hooks 'open' to create a list
of all files opened, which could be useful.
So rebindings are a useful trick.

>Mind you, the whole thing sounds sooo hard, that
>anythong that makes it easier is good.

	The things that turn out to be hard are
often surprising. For example, right now I having
major problems with something that sounds trivial:
getting the interpreter to set the line number
so that a traceback can print out as it does in Python.

	That sounds easy, but the LR grammar I'm
using is not well adapted to it -- NEWLINE tokens
come after statements, instead of before them,
where the line number is needed.

John Max Skaller                ph:61-2-96600850              
mailto:skaller at maxtal.com.au       10/1 Toxteth Rd 
http://www.maxtal.com.au/~skaller  Glebe 2037 NSW AUSTRALIA




More information about the Python-list mailing list