[Python-3000] some stats on identifiers (PEP 3131)

Steve Howell showell30 at yahoo.com
Sun May 27 01:42:46 CEST 2007


Here is a survey of some Python code to see how often
tokens typically get used in Python 2.

Here is the program I used to count the tokens, if you
want to try it out on your own in-house codebase:

import tokenize
import sys
fn = sys.argv[1]
g = tokenize.generate_tokens(open(fn).readline)
dct = {}
for tup in g:
    if tup[0] == 1:
        identifier = tup[1]
        dct[identifier] = dct.get(identifier, 0) + 1
identifiers = dct.keys()
identifiers.sort()
for identifier in identifiers:
    print '%4d' % dct[identifier], identifier

The top 15 in gettext.py:

ssslily> python2.5 count.py
/usr/local/lib/python2.5/gettext.py | sort -rn | head
-15
  98 self
  73 if
  69 return
  39 def
  35 msgid1
  34 tmsg
  33 n
  33 None
  32 domain
  31 message
  29 msgid2
  28 _fallback
  21 else
  20 locale
  20 in

The top 15 in an in-house program that deals with an
American-based format for sending financial
transactions (closest thing I could find to Dutch tax
law):

  23 trackData
  19 ErrorMessages
  18 rest
  16 cuts
  12 encryptedPin
  11 return
  10 request
  10 p2
  10 p1
  10 maskedMessage
  10 j
  10 in
  10 i
   9 len
   9 ccNum



       
____________________________________________________________________________________Choose the right car based on your needs.  Check out Yahoo! Autos new Car Finder tool.
http://autos.yahoo.com/carfinder/


More information about the Python-3000 mailing list