interpreter vs. compiled

castironpi castironpi at gmail.com
Sat Aug 2 19:26:06 EDT 2008


On Aug 2, 2:02 pm, Tim Roberts <t... at probo.com> wrote:
> castironpi <castiro... at gmail.com> wrote:
>
> >And furthermore, I think I'm getting
> >confused about what exactly constitutes an interpreter: it is whether
> >there is a process that runs product instructions, or the product
> >instructions can run standalone.  I would take 'compiler' to mean,
> >something that outputs an .EXE executable binary file, and I don't
> >just mean bundling up the python.exe executable with a file.
>
> OK, let me give MY definition.  I freely grant that my definition might be
> different from anyone elses, but perhaps this will help you understand the
> basis for my arguments.

I understand that we're having a disagreement about terminology.  I
further don't understand exactly what JIT languages are, so I can't
agree on that either.

I will observe the certain amount of corporate hype behind, and worker
base morale riding on, the notion that JIT technology compiles code.
I suspect it's an exaggeration, not outright false, but I can't prove
it until I tell you what instructions run, one right after another, on
a concrete architecture I've held in my hand, like the x86 die.  Nor
can I thoroughly believe that it's true either, though, until its
creators have told me what instructions they are.  So I'll proclaim
ignorance and await facts... or consistent stories about them.

> If I run three different CPython programs, the bytes of machine language
> that get executed are come from the same place: python24.dll.  My user
> programs are just data.  That, in my mind, makes the CPython implementation
> an interpreter.
>
> If I compile and run three different C programs, the bytes of machine
> language will be come from three different places.  That, in my mind, makes
> my C implementation a compiler.

True.  I agree on the facts and the terms.

> If I compile and run three different C# programs, the JIT compiler makes
> new machine language for each one.  The bytes of machine language will come
> from three different places.  That, in my mind, makes the C# implementation
> a compiler.
>
> If I compile and run three different IronPython programs, the JIT compiler
> makes new machine language for each one.  The bytes of machine language
> will come from three different places.  That, in my mind, makes the
> IronPython implementation a compiler.

I don't know enough to attest to these for a fact, and you haven't
given enough details to corroborate them as facts.  But when you do,
I'll be able to take and learn your terms for them (not that I will,
of course, but I can).

> All four of those scenarios require run-time library support.  Even the C
> progam does not run on its own.

I disagree with this, if the C program is statically linked -- the OS
copies the binary (.EXE) from disk into memory, then jumps to a
specific offset in that block / address space.  It runs all its own
bytes, then jumps back to an OS-specified point of return of control.
For the other three, though, this is true.

> Execution starts in the run-time library,
> which sets up an environment before jumping to "main".  The C# and
> IronPython situations are the same; it's just that there's more processing
> going on before jumping to "main".

I want to give a concrete example of 'generating machine code' per se
(as such).

I run this program: <fiction>

bin= open( 'abinary.exe', 'w' )
bin.write( '\x09\x0f\x00\x00' )
for x in range( 10 ):
   bin.write( '\x04\xA0' + chr( x ) + '\x00' )
bin.write( '\x01\x20\x00\x00' )

It outputs to 'abinary.exe':

\x09\x0f\x00\x00
\x04\xa0\x00\x00
\x04\xa0\x01\x00
\x04\xa0\x02\x00
\x04\xa0\x03\x00
\x04\xa0\x04\x00
\x04\xa0\x05\x00
\x04\xa0\x06\x00
\x04\xa0\x07\x00
\x04\xa0\x08\x00
\x04\xa0\x09\x00
\x01\x20\x00\x00

Which is 12 bytes long and runs in a millisecond.  What it does is set
a memory address to successive integers 0..9, then yields.  Due to the
nature of program flow control, while it runs its first steps on any
x86 machine, the yield only succeeds if on Windows 98+, and crashes
the machine, or otherwise loses control if not.  (That part depends on
those OSses.)

I can try something similar dynamically.

char* mem= alloc( 48 )
setpermission( mem, EXECUTE )
memcpy( mem+ 0, "\x09\x0f\x00\x00", 4 )
for( int x= 0; x< 10; ++x ) {
   memcpy( mem+ 4* (x+ 1 ), '\x04\xA0\x00\x00', 4 )
   mem[ 4* (x+ 1 )+ 3 ]= (char) x
memcpy( mem+ 44, '\x01\x20\x00\x01', 4 )
setjump
goto mem

Which with some imagination produces the contents of 'abinary.exe'
above (one difference, last word) in a memory block, at address 'mem',
then jumps to it, which then jumps back, and then exits. </fiction>

I'll compare a C complation to the first example, 'abinary.exe', and a
JIT compilation to the second example, 'char* mem'.  If the comparison
isn't accurate, say how, because these are places I can start from...
(yes, that is, instead of just repeating the claims).

When does a JIT do this, and what does it do in the meantime?



More information about the Python-list mailing list