Python CPU

Sun Apr 3 13:15:34 EDT 2011

On 4/3/2011 8:44 AM, Werner Thie wrote:
> You probably heard of the infamous FORTH chips like the Harris RTX2000,
> or ShhBoom, which implemented a stack oriented very low power design
> before there were FPGAs in silicon. To my knowledge the RTX2000 is still
> used for space hardened application and if I search long enough I might
> fine the one I had sitting in my cellar.
>
> The chip was at that time so insanely fast that it could produce video
> signals with FORTH programs driving the IO pins. Chuck Moore, father of
> FORTH developed the chip on silicon in FORTH itself.

     He did version 1, which had a broken integer divide operation.
(Divisors which were odd numbers produced wrong answers. Really.)
I came across one of those in a demo setup at a surplus store in
Silicon Valley, driving the CRT and with Moore's interface that
did everything with chords on three buttons.

> Due to the fact, that the instruction sets of a FORTH machine, being a
> very general stack based von Neumann system, I believe that starting
> with an RTX2000 (which should be available in VHDL) one could quite fast
> be at a point where things make sense, meaning not going for the
> 'fastest' ever CPU but for the advantage of having a decent CPU
> programmable in Python sitting on a chip with a lot of hardware available.

     Willow Garage has VHDL available for a Forth CPU.  It's only 200
lines.

     The Forth CPUs have three separate memories - RAM, Forth stack,
and return stack. All three are accessed on each cycle.  Back before
microprocessors had caches, this was a win over traditional CPUs,
where memory had to be accessed sequentially for those functions.
Once caches came in, it was a lose.

     It's interesting that if you wanted to design a CPU for Googles's
"nativeclient" approach for executing native code in the browser,
a separate return point stack would be a big help.  Google's
"nativeclient" system protects return points, so that you can tell,
from the source code, all the places control can go.  This is
a protection against redirection via buffer overflows, something
that's possible on x86 because the return points and other data
share the same stack.

     Note that if you run out of return point stack, or parameter
stack, you're stuck.  So there's a hardware limit on call depth.
National Semiconductor once built a CPU with a separate return
point stack with a depth of 20.  Big mistake.

     (All of this is irrelevant to Python, though. Most of Python's
speed problems come from spending too much time looking up attributes
and functions in dictionaries.)

				John Nagle