Using PLY

Fri Sep 17 16:06:41 EDT 2004

>  >>> import tokenize
>  >>> import StringIO
>  >>> src = StringIO.StringIO("""
>  ... 

The tokenize module would definitely be simpler if it's Python code
that he happens to be parsing.  If it's not Python code, then there's
still a reason to use PLY..

------------------------------------------

Here's a kludgy but quick solution- modify the LexToken class in
lex.py to keep track of number of type occurences.

class LexToken(object):  # change to new style class
    type_count = {}   # store the count here
    def __setattr__(self, key, value):
        if key == 'type':
            # when type attribute is assigned, increment counter
            if value not in self.type_count:
                self.type_count[value] = 1
            else:
                self.type_count[value] += 1
        object.__setattr__(self, key, value)

    # ... and proceed with the original definition of LexToken

    def __str__(self):
        return "LexToken(%s,%r,%d)" %
(self.type,self.value,self.lineno)
    def __repr__(self):
        return str(self)
    def skip(self,n):
        try:
            self._skipn += n
        except AttributeError:
            self._skipn = n
-----------------------------------------

After you've run the lexer, lex.LexToken.type_count will the contain
number of occurences of each token type.

-----------------------------------------

(Caveats-  1. I haven't tested this code.   2. I've got PLY 1.3;
syntax may have changed in newer versions.  In fact, I hope it's
changed; while PLY works very well, its usage could be way more
pythonic)