Using PLY

Bengt Richter bokr at oz.net
Fri Sep 17 04:03:39 EDT 2004


On Fri, 17 Sep 2004 04:48:36 GMT, Maurice LING <mauriceling at acm.org> wrote:

>Hi,
>
>I know that PLY lex is able to do line counting. I am wondering if there 
>is a way to count the number of each keywords (tokens) in a given file? 
>For example, how many IF tokens etc?
>

 >>> import tokenize
 >>> import StringIO
 >>> src = StringIO.StringIO("""
 ... if a: foo()
 ... elif b: bar()
 ... if c: baz()
 ... """)
 >>> sum([1 for t in tokenize.generate_tokens(src.readline) if t[1]=='if'])
 2

That generates an intermediate list with a 1 for each 'if', but it's not a big
price to pay IMO.
If you have a file in the current working directory, e.g., foo.py, substitute

    src = file('foo.py')

or do it in one line, like (untested):

    sum([1 for t in tokenize.generate_tokens(file('foo.py').readline) if t[1]=='if'])

generate_tokens returns a generator that returns tuples, e.g. for the above

Rewind src:
 >>> src.seek(0)

Get the generator:
 >>> tg = tokenize.generate_tokens(src.readline)

Manually get a couple of examples:
 >>> tg.next()
 (53, '\n', (1, 0), (1, 1), '\n')
 >>> tg.next()
 (1, 'if', (2, 0), (2, 2), 'if a: foo()\n')

Rewind the StringIO object to start again:
 >>> src.seek(0)

Show all the token tuples:
 >>> for t in tokenize.generate_tokens(src.readline): print t
 ...
 (53, '\n', (1, 0), (1, 1), '\n')
 (1, 'if', (2, 0), (2, 2), 'if a: foo()\n')
 (1, 'a', (2, 3), (2, 4), 'if a: foo()\n')
 (50, ':', (2, 4), (2, 5), 'if a: foo()\n')
 (1, 'foo', (2, 6), (2, 9), 'if a: foo()\n')
 (50, '(', (2, 9), (2, 10), 'if a: foo()\n')
 (50, ')', (2, 10), (2, 11), 'if a: foo()\n')
 (4, '\n', (2, 11), (2, 12), 'if a: foo()\n')
 (1, 'elif', (3, 0), (3, 4), 'elif b: bar()\n')
 (1, 'b', (3, 5), (3, 6), 'elif b: bar()\n')
 (50, ':', (3, 6), (3, 7), 'elif b: bar()\n')
 (1, 'bar', (3, 8), (3, 11), 'elif b: bar()\n')
 (50, '(', (3, 11), (3, 12), 'elif b: bar()\n')
 (50, ')', (3, 12), (3, 13), 'elif b: bar()\n')
 (4, '\n', (3, 13), (3, 14), 'elif b: bar()\n')
 (1, 'if', (4, 0), (4, 2), 'if c: baz()\n')
 (1, 'c', (4, 3), (4, 4), 'if c: baz()\n')
 (50, ':', (4, 4), (4, 5), 'if c: baz()\n')
 (1, 'baz', (4, 6), (4, 9), 'if c: baz()\n')
 (50, '(', (4, 9), (4, 10), 'if c: baz()\n')
 (50, ')', (4, 10), (4, 11), 'if c: baz()\n')
 (4, '\n', (4, 11), (4, 12), 'if c: baz()\n')
 (0, '', (5, 0), (5, 0), '')

HTH

Regards,
Bengt Richter



More information about the Python-list mailing list