Java to Python translation

Jeremy Bowers newsfroups at jerf.org
Thu Apr 4 00:07:31 EST 2002


vio wrote:
> Greetings,
> 
> I need to translate a (rather voluminous 100MB) java program to python. After a
> few searches on google, apparently the best tools for this task are Lex-Yacc
> (Flex-Bison on linux). While I am working on the Lex-YACC HOWTO, I would be very
> interested in hearing of others' successful experiences/feedback at this. Any
> pointers of interest would similarily be appreciated.

Translating amoung computer languages is not quite as hard as 
translating amoung human languages, but it's not exactly trivial either. 
It's the little things that will trip you up: The legality and meaning 
of negative indices, the subtle differences in rules for what can go in 
a dictionary, subtle differences in precision, order of operations, 
facilities that don't quite work in one language or the other that 
require thousands of lines of code to precisely emulate, but that a 
*human* could tell could be fixed with just two carefully chosen lines. 
Features in Java with no direct equivalents in Python (overloading's a 
big one, but it's not the only one). Differences in propogation of 
exceptions.

And you say you want this because you have a 100MB of code. By the same 
token, that 200+MB of code you'll still need to verify after this 
hypothetical operation, and 200+MB of code that will be throwing new and 
subtle errors... if you have unit tests. (If you don't have unit tests, 
quit now; you don't stand a chance in hell of testing 200MB of code. 
Even if it's 195MB comments!)

(Doubling in size is not unreasonable. Don't expect it to shrink in 
size, or improve in readibility. Entrophy requires your code to decrease 
in legibility; the more you automate, the more true this is. Ever looked 
at the assembly output of your C compiler? Or looked at C++ translated 
into C, back when compilers worked that way? I'd actually guess 3-5x, 
what with all the little error checking and behavior fixing code that 
will probably have to be added every time you use a data structure or 
library call that almost, but doesn't quite, exist in the other 
language. Add all this together, and the code you're verifying won't 
exactly be the paragon of readibility, either; this will be as close to 
obfuscated Python as you can expect to see in your actual life.)

And the fact that you will experience tens of thousands of instances of 
"the little things", each multiplying and playing off of each other.

Bear in mind that even if you get the grammer right, you essentially 
have to write a complete and total interface in Python to match the 
libraries you use in Java. Use Java streams? You will need to write an 
emulator for that stream in Python. (Even if you think you can convert 
the use of streams to some python equivalent, you need to write that 
too! Syntax analysis will only be the first step of a long journey here!)

God help you if this is a GUI project!

Use the network? More subtle differences, even after you code the 
translation layer. Data structures? You'll need to emulate those. How 
many 3rd party Java libraries do you use? (Are you sure?) You'll have to 
re-implement those too. Even if it's just writing a wrapper around a 
Python library with similar (but not identical!) behavior, that's still 
a non-trivial task, because you need to test it, and test it, and test 
it some more, because *100MB* of Java code may use it in every 
conceivable way! You can't even get away with capability-for-capability 
compatibility, you need bug-for-bug!

When all's said and done, you essentially need to write Java inside 
Python, at least all of it you use. Somebody wrote C++ in C. Somebody 
wrote C in assembler. Somebody wrote assembler in machine code. Somebody 
wrote Java in C++. (Sure, the projects have since migrated, but that 
doesn't really apply to your case.) Actually, lots of someones.

You are faced with a virtually impossible task; how many people will be 
helping you translate this 100MB of code, and how long do you have? 
Trying to replace it with an actually impossible task is not a step 
forward, even if they both look equally unlikely to be doable now.  If 
there's no other option, perhaps you should simply seek employment 
elsewhere; it's likely to be more rewarding in the long run.

Maybe, just *maybe*, if this is 100MB of **pure** computational code, 
with a small input module, a small output module, and 99MB of 
mathematical-type manipulations that never ever *ever* calls out to any 
sort of library in Java, you *might* be able to get away with this. 
Maybe. Maybe. I'd be several orders of magnitude more optimistic if 
those "M"'s were "K"'s, though.




More information about the Python-list mailing list