[Edu-sig] counting lexemes...

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Tue, 2 Apr 2002 17:49:52 -0800 (PST)


On 1 Apr 2002, Jeffrey Elkner wrote:

> i got such a great response to my last query that i'm trying another one
> ;-)  is there anything out there already that i can use to parse python,
> c++, and java source files to get a listing and count of the lexemes
> that occur in each?
>
> i spent the better part of an afternoon writing python scripts to remove
> comments and docstrings so that i could compare line numbers, and i'm
> afraid parsing to get at the lexemes is beyond my ability within the
> time i have left to prepare my thesis.

The Antlr parser generator by Terrence Parr,

    http://www.antlr.org/

has an example lexer/parser for Java 1.3, so you might be able to generate
a Java lexer and parser using Antlr, and then drive it with Jython.  I
also saw a link to a production-quality C lexer and parser as well.

This project looks interesting; if I have time, I'll see if I can cook up
something.  *grin*


Good luck to you!