[Tutor] perl to python?

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Mon Dec 2 22:24:02 2002


[Note: the post I'm writing below is more about Perl than Python, and more
about understanding Perl's parse trees than about writing a Python
converter, so it's a bit off-topic.]



On Mon, 2 Dec 2002, Lance wrote:

> Is there a Perl to Python conversion program?


Unfortunately, no, not yet.


It should be technially possible to write a program to do this, but the
code might end up looking even more miserable than the original Perl code.


Still, I wonder how hard it would be to cook a toy example up.



What makes such an automatic converter hard is that Perl's grammar doesn't
appear to be really documented anywhere except in the Perl source code.
We'd need a tool that generates a "parse tree" of Perl code; once we had a
parse tree, we might have a good chance of writing a PerlToPython
converter.


In fact, it has been said that "Only Perl can parse Perl":

  http://www.everything2.com/index.pl?node=only%20Perl%20can%20parse%20Perl


So such a converter would probably have to use Perl itself to generate the
parse tree.  Perl does provide a 'backend' module called B for this.

    http://www.perlpod.com/stable/perlcompile.html



What does a "parse tree" look like?  It's a low-level representation of
the language.  Here's an example of a such a "parse" of a simple
'hello.pl' program:

######
[dyoo@tesuque dyoo]$ cat hello.pl

print "Hello world\n";

###



It's a simple little program.  Here's its "parse tree":

###
[dyoo@tesuque dyoo]$ perl -MO=Terse hello.pl
LISTOP (0x81668b0) leave [1]
    OP (0x81668d8) enter
    COP (0x80fe8d8) nextstate
    LISTOP (0x8166868) print
        OP (0x8166890) pushmark
        SVOP (0x817bec8) const  PV (0x80f6d88) "Hello world\n"
test.pl syntax OK
######


The capitalized letters on the left hand side are "opcodes" --- operation
codes.  If we visit this "tree" in a preorder traversal, we'll see that:

    1.  Perl calls "enter", whatever that means.  I think it means that it
        will enter the program start.

    2.  It generates a nextstate, whatever that means.

    3.  It does a "pushmark" operation, whatever that is.

    4.  It puts the argument "hello world" on its stack.  The 'SV' in SVOP
        stands for "scalar variable".

    5.  It calls the 'print' list operator.

    6.  Finally, it exits, with a return value of 1, I think.



Here's a small Python program that's specifically designed to visit this
particular tree.  I know that it's incorrect and incomplete (I don't even
understand the opcodes yet!  *grin*), but it should give the flavor of
what effort a PerlToPython converter might involve:


###
"""A small program to demonstrate what might be involved in parsing
Perl into Python.

Danny Yoo (dyoo@hkn.eecs.berkeley.edu)
"""

parse_tree = ("LISTOP", "leave", 1,
	      [("OP", "enter",
		[("COP", "nextstate",
		  [("LISTOP", "print",
		    [("OP", "pushmark", []),
		     ("SVOP", "const PV", "Hello world\n", [])])])])])

## Some utility functions that we might need...
def opcode(instruction):
    return instruction[0]

def children(instruction):
    return instruction[-1]

def operands(instruction):
    return instruction[1:-1]


class PerlToPython:
    def __init__(self):
        self.stack = []
        self.lines = []

    def visit(self, instruction):
        op = opcode(instruction)
        dispatch_function = getattr(self, "visit_" + op)
        dispatch_function(instruction)


    def visit_LISTOP(self, instruction):
        for child in children(instruction):
            self.visit(child)
        args = operands(instruction)
        if args[0] == 'print':
            self.lines.append("print " + ','.join(self.stack))
        elif args[0] == 'leave':
            self.lines.append("raise SystemExit")


    def visit_OP(self, instruction):
        for child in children(instruction):
            self.visit(child)
        return  ## fixme!


    def visit_COP(self, instruction):
        for child in children(instruction):
            self.visit(child)
        return  ## fixme!

    def visit_SVOP(self, instruction):
        type, value = operands(instruction)
        if type == "const PV":
            self.stack.append("%s" % repr(value))


if __name__ == '__main__':
    converter = PerlToPython()
    converter.visit(parse_tree)
    print '\n'.join(converter.lines)
###



Here's an example of this in action:

###
[dyoo@tesuque dyoo]$ python perl_into_python.py
print 'Hello world\n'
raise SystemExit
###





Let's look at another Perl parse tree of a slightly more complicated
program:

###
[dyoo@tesuque dyoo]$ cat loops.pl

for ($i = 0; $i < 10; $i++) {
    print "$i\n";
}


[dyoo@tesuque dyoo]$ perl -MO=Terse loops.pl
LISTOP (0x80fa8c8) leave [1]
    OP (0x80fa928) enter
    COP (0x80fa8f0) nextstate
    BINOP (0x8166900) sassign
        SVOP (0x81668e0) const  IV (0x80f6d88) 0
        UNOP (0x81668c0) null [15]
            SVOP (0x817bec8) gvsv  GV (0x81025f0) *i
    BINOP (0x80fa8a0) leaveloop
        LOOP (0x80fa870) enterloop
        UNOP (0x8104820) null
            LOGOP (0x81047f8) and
                BINOP (0x8166988) lt
                    UNOP (0x8166948) null [15]
                        SVOP (0x8166928) gvsv  GV (0x81025f0) *i
                    SVOP (0x8166968) const  IV (0x8102590) 10
                LISTOP (0x8104798) lineseq
                    LISTOP (0x8104750) scope
                        OP (0x8104718) null [174]
                        LISTOP (0x8103fb8) print
                            OP (0x8103fe0) pushmark
                            UNOP (0x8103f90) null [67]
                                OP (0x8184868) null [3]
                                BINOP (0x8184840) concat [2]
                                    UNOP (0x8184720) null [15]
                                        SVOP (0x8184700) gvsv  GV
(0x81025f0) *i
                                    SVOP (0x8184740) const  PV (0x8102614)
"\n"
                    UNOP (0x8184820) preinc [1]
                        UNOP (0x8104928) null [15]
                            SVOP (0x8104908) gvsv  GV (0x81025f0) *i
                    OP (0x8104778) unstack
                    COP (0x81047c0) nextstate
loops.pl syntax OK
###


The complexity of this is a little bit deeper, but the idea is the same:
we have to handle some of these new opcodes, like LOOP, and transform them
into their Python equivalents.  We may need to keep additional track of
things like scope and nesting.  Not a particularly "hard" task, but it
might be a little complicated.



Alternatively, it might also be possible to translate a Perl parse tree
into a Python parse tree with the help of the 'compiler' module:

    http://www.python.org/doc/lib/compiler.html

and then use a program called 'decompyle' to take that Python parse tree
and reproduce human-readable text:

    http://www.crazy-compilers.com/decompyle/


But somehow, I think this might take longer to write than I anticipated.
Still, it is very possible to write this program.  It might make a nice
winter project.  *grin*



Good luck to you!