simplified Python parsing question

Laszlo Nagy gandalf at shopzeus.com
Mon Jul 30 05:25:28 EDT 2012


> I appreciate the help because I believe that once this is working, 
> it'll make a significant difference in the ability for disabled 
> programmers to write code again as well as be able to integrate within 
> existing development team and their naming conventions. 

Did you try to use pygments?

http://pygments.org/docs/api/

It already contains a lexer for Python source code. You can create a 
Lexer (pygments.lexer.Lexer) then call its get_tokens method.

Then you can use this to identify statements:

http://docs.python.org/reference/simple_stmts.html

Fortunately, almost all statements begin with a keyword. There are some 
exceptions:

     expression statement
     assignment statement

I would first tokenize the code, then divide it by statement keywords. 
Finally, you just need to find expression/assignment statements in the 
remaining sections. (Maybe there is a better way to do it.)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20120730/79b1c8b4/attachment.html>


More information about the Python-list mailing list