interpreter vs. compiled

Fri Jul 18 15:13:02 EDT 2008

On Jul 18, 2:17 pm, castironpi <castiro... at gmail.com> wrote:
> On Jul 17, 11:39 pm, Kay Schluehr <kay.schlu... at gmx.net> wrote:
>
>
>
> > On 18 Jul., 01:15, castironpi <castiro... at gmail.com> wrote:
>
> > > On Jul 17, 5:37 pm, I V <ivle... at gmail.com> wrote:
>
> > > > On Thu, 17 Jul 2008 15:08:17 -0700, castironpi wrote:
> > > > > The Python disassembly is baffling though.
>
> > > > >>>> y= 3
> > > > >>>> dis.dis('x=y+1')
>
> > > > You can't disassemble strings of python source (well, you can, but, as
> > > > you've seen, the results are not meaningful). You need to compile the
> > > > source first:
>
> > > > >>> code = compile('y=x+1','-', 'single')
> > > > >>> dis.dis(code)
>
> > > >   1           0 LOAD_NAME                0 (x)
> > > >               3 LOAD_CONST               0 (1)
> > > >               6 BINARY_ADD
> > > >               7 STORE_NAME               1 (y)
> > > >              10 LOAD_CONST               1 (None)
> > > >              13 RETURN_VALUE
>
> > > > You may well find these byte codes more meaningful. Note that there is a
> > > > list of opcodes athttp://docs.python.org/lib/bytecodes.html
>
> > > Oh.  How is the stack represented?
>
> > As a pointer to a pointer of PyObject structs.
>
> > > Does it keep track of which stack
> > > positions (TOS, TOS1, etc.) are in what registers?  Does stack
> > > manipulation consume processor cycles?
>
> > Python does not store values in registers. It stores locals in arrays
> > and accesses them by position ( you can see the positional index in
> > the disassembly right after the opcode name ) and globals / object
> > attributes in dicts.
>
> > For more information you might just download the source distribution
> > and look for src/Python/ceval.c. This file contains the main
> > interpreter loop.
>
> Ah, found it.  The parts that are making sense are:
>
> register PyObject **stack_pointer;
> #define TOP()           (stack_pointer[-1])
> #define BASIC_POP()     (*--stack_pointer)
>
> ...(line 1159)...
> w = POP();
> v = TOP();
> if (PyInt_CheckExact(v) && PyInt_CheckExact(w)) {
>         /* INLINE: int + int */
>         register long a, b, i;
>         a = PyInt_AS_LONG(v);
>         b = PyInt_AS_LONG(w);
>         i = a + b;
>         if ((i^a) < 0 && (i^b) < 0)
>                 goto slow_add;
>         x = PyInt_FromLong(i);
>
> ... Which is more than I was picturing was involved.  I understand it
> is also specific to CPython.  Thanks for the pointer to the code.
>
> My basic question was, what is the difference between compilers and
> interpreters, and why are interpreters slow?  I'm looking at some of
> the answer right now in "case BINARY_ADD:".

The basic difference between a (traditional) compiler and an
interpreter is that a compiler emits (assembly) code for a specific
machine. Therefore it must know the specifics of the machine (how many
registers, memory addressing modes, etc), whereas interpreters
normally define themselves by their conceptual state, that is, a
virtual machine. The instructions (bytecode) of the virtual machine
are generally more high-level than real machine instructions, and the
semantics of the bytecode are implemented by the interpreter, usually
in a sort-of high level language like C. This means the interpreter
can run without detailed knowledge of the machine as long as a C
compiler exists. However, the trade off is that the interpreter
semantics are not optimized for that machine.

This all gets a little more hairy when you start talking about JITs,
runtime optimizations, and the like. For a real in-depth look at the
general topic of interpretation and virtual machines, I'd recommend
Virtual Machines by Smith and Nair (ISBN:1-55860910-5).

-Dan