[pypy-dev] How to translate 300000 lines of C

Christian Tismer tismer at tismer.com
Mon Jan 20 15:16:00 CET 2003


Edward K. Ream wrote:
> On Mon, Jan 20, 2003 at 03:27:31AM +0100, Christian Tismer wrote:
> 
>>I believe that it is possible to automate this translation
>>process!
> 
> Yes!  I think it is a very good idea.  I would certainly be much more
> 
> happy
> 
>>with keeping a reasonable-sized translator up-to-date than having to do so
>>with the huge C code base.
> 
> 
> In a private email to Christian I suggested making this whole problem go
> away by changing the name of this project from minimalPython to
> psycoticPython :-)

Oh, I didn't get that until now. :-)

> Whether automated or not, translating tested C code to Python seems
> extremely difficult and risky.  It is risky because it implies one of two
> speculative assumptions:
> 
> 1. The Python library will eventually outperform the C library or:
> 2. Guido will at some point approve supporting _two_ versions of the same
> library.
> 
> I view assumption 2 as having almost zero probability, though of course I
> don't speek for Guido in any way. The reason is plain: it is odious to keep
> two sets of source code in synch.
> 
> That leaves assumption 1. No point in arguing over the probabilities of it
> now: let's assume it is will be proved correct.  I would be inclined to pick
> _one_ module to work on as a test bed.  Translation can be done by hand.  We
> can then test assumption 1.

Fine with me.

> The bigger translation problem becomes real only if assumption 1 is proved
> to be true.  Even then, I would imagine a _lengthy_ probationary period for
> each translated module before it becomes accepted into the library. So it
> isn't so important how long translation takes; the translation process is
> much less important than the testing process.

That's very true. The testing process will probably
take longer as one or two new Python versions.
We have to run in parallel for a reasonable period.
That's why we need a semi-automated process that
is easy to use on a changed code base.

> My script c2py.py works only on translating C to Python syntax.  It's
> complex enough.  The hisory of machine translation of natural languages is
> littered with initial failure, in some cases with limited success after
> decades of work.  Myself, I wouldn't invest any time at all in automatically
> translating C semantics to Python semantics.  YMMV.

Well, it is not C, it is Pythonic C already.
That's much simpler than C.
(Which means, it doesn't use every and all possible
trick in C, it has cleanly seperated statements,
very little usage of macros, all ambiguous looking
constructs are well-embraced)

I also don't think to automatically translate
the whole bunch without looking into the output.
Instead, I think of a C parser which emits a series
of tokens, or maybe AST objects, which is then fed
into a Python code generator.
This generator should only provide some common rules
how to map certain constructs. It should stop in
a situation it cannot handle. The porting work is
to write configuration scripts for that, which control
what to map how. I think this is quite an interactive
process, but with the benefit that it is most
probably repeatable for a slightly changed new Python
version.
There are also common patterns which should be replaced
by some more abstract Python functions, which describe
*what* is happening, instead of always telling *how*
to do it, in an inlined way.

This is what I call "uplifting". This is of course
no quick process. The automated tool will help us to
avoid tedious work, and to avoid errors by systematic
mappings. And we can play with that and configure and
fine tune, until the result looks as we like it.

Not meantioning all the new ideas which we will
have while we're at it.
Right now, everything is an oracle.

ciao - chris

-- 
Christian Tismer             :^)   <mailto:tismer at tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  pager +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/



More information about the Pypy-dev mailing list