interpreter vs. compiled

Tue Aug 5 10:21:00 EDT 2008

On Aug 3, 1:26 am, castironpi <castiro... at gmail.com> wrote:
> On Aug 2, 2:02 pm, Tim Roberts <t... at probo.com> wrote:
>
> > castironpi <castiro... at gmail.com> wrote:
>
> > >And furthermore, I think I'm getting
> > >confused about what exactly constitutes an interpreter: it is whether
> > >there is a process that runs product instructions, or the product
> > >instructions can run standalone.  I would take 'compiler' to mean,
> > >something that outputs an .EXE executable binary file, and I don't
> > >just mean bundling up the python.exe executable with a file.
>
> > OK, let me give MY definition.  I freely grant that my definition might be
> > different from anyone elses, but perhaps this will help you understand the
> > basis for my arguments.
>
> I understand that we're having a disagreement about terminology.  I
> further don't understand exactly what JIT languages are, so I can't
> agree on that either.
>
> I will observe the certain amount of corporate hype behind, and worker
> base morale riding on, the notion that JIT technology compiles code.
> I suspect it's an exaggeration, not outright false, but I can't prove
> it until I tell you what instructions run, one right after another, on
> a concrete architecture I've held in my hand, like the x86 die.  Nor
> can I thoroughly believe that it's true either, though, until its
> creators have told me what instructions they are.  So I'll proclaim
> ignorance and await facts... or consistent stories about them.
>
> > If I run three different CPython programs, the bytes of machine language
> > that get executed are come from the same place: python24.dll.  My user
> > programs are just data.  That, in my mind, makes the CPython implementation
> > an interpreter.
>
> > If I compile and run three different C programs, the bytes of machine
> > language will be come from three different places.  That, in my mind, makes
> > my C implementation a compiler.
>
> True.  I agree on the facts and the terms.
>
> > If I compile and run three different C# programs, the JIT compiler makes
> > new machine language for each one.  The bytes of machine language will come
> > from three different places.  That, in my mind, makes the C# implementation
> > a compiler.
>
> > If I compile and run three different IronPython programs, the JIT compiler
> > makes new machine language for each one.  The bytes of machine language
> > will come from three different places.  That, in my mind, makes the
> > IronPython implementation a compiler.
>
> I don't know enough to attest to these for a fact, and you haven't
> given enough details to corroborate them as facts.  But when you do,
> I'll be able to take and learn your terms for them (not that I will,
> of course, but I can).
>
> > All four of those scenarios require run-time library support.  Even the C
> > progam does not run on its own.
>
> I disagree with this, if the C program is statically linked -- the OS
> copies the binary (.EXE) from disk into memory, then jumps to a
> specific offset in that block / address space.  It runs all its own
> bytes, then jumps back to an OS-specified point of return of control.
> For the other three, though, this is true.
>
> > Execution starts in the run-time library,
> > which sets up an environment before jumping to "main".  The C# and
> > IronPython situations are the same; it's just that there's more processing
> > going on before jumping to "main".
>
> I want to give a concrete example of 'generating machine code' per se
> (as such).
>
> I run this program: <fiction>
>
> bin= open( 'abinary.exe', 'w' )
> bin.write( '\x09\x0f\x00\x00' )
> for x in range( 10 ):
>    bin.write( '\x04\xA0' + chr( x ) + '\x00' )
> bin.write( '\x01\x20\x00\x00' )
>
> It outputs to 'abinary.exe':
>
> \x09\x0f\x00\x00
> \x04\xa0\x00\x00
> \x04\xa0\x01\x00
> \x04\xa0\x02\x00
> \x04\xa0\x03\x00
> \x04\xa0\x04\x00
> \x04\xa0\x05\x00
> \x04\xa0\x06\x00
> \x04\xa0\x07\x00
> \x04\xa0\x08\x00
> \x04\xa0\x09\x00
> \x01\x20\x00\x00
>
> Which is 12 bytes long and runs in a millisecond.  What it does is set
> a memory address to successive integers 0..9, then yields.  Due to the
> nature of program flow control, while it runs its first steps on any
> x86 machine, the yield only succeeds if on Windows 98+, and crashes
> the machine, or otherwise loses control if not.  (That part depends on
> those OSses.)
>
> I can try something similar dynamically.
>
> char* mem= alloc( 48 )
> setpermission( mem, EXECUTE )
> memcpy( mem+ 0, "\x09\x0f\x00\x00", 4 )
> for( int x= 0; x< 10; ++x ) {
>    memcpy( mem+ 4* (x+ 1 ), '\x04\xA0\x00\x00', 4 )
>    mem[ 4* (x+ 1 )+ 3 ]= (char) x
> memcpy( mem+ 44, '\x01\x20\x00\x01', 4 )
> setjump
> goto mem
>
> Which with some imagination produces the contents of 'abinary.exe'
> above (one difference, last word) in a memory block, at address 'mem',
> then jumps to it, which then jumps back, and then exits. </fiction>
>
> I'll compare a C complation to the first example, 'abinary.exe', and a
> JIT compilation to the second example, 'char* mem'.  If the comparison
> isn't accurate, say how, because these are places I can start from...
> (yes, that is, instead of just repeating the claims).
>
> When does a JIT do this, and what does it do in the meantime?

The JIT works like an assembler/linker that writes to memory. It will
load the
file(s) containing the bytecode and generate the required assembly
instructions into
memory.

In the case there are dependencies to other modules, they will be
loaded as well, and
compiled. Then the linker will take care that cross references between
modules are correct,
like memory addresses and branch targets.

A clever JIT might add instrumentation points, so that it can rewrite
the code using profile
guided optimizations, this means generating optimized code using as
input the program behaviour.

This makes JIT code usually faster than normal compiled code. Although
normal native code is
able to start executing faster, it only targets a specific set of
processors.

JIT code is independent of the processor, and a good JIT
implementation is able to explore the
processor better than a direct native compiler. There is however the
time penalty on program
startup.

--
Paulo