Faster 'if char in string' test

Kamilche klachemin at home.com
Thu Jun 24 02:45:43 EDT 2004


I was looking for a way to speed up detecting invalid characters in my
TCP string, and thought of yet another use for the translate function!
If you were to 'translate out' the bad characters, and compare string
lengths afterwards, you would know whether or not the line contained
invalid characters. The new method is more than 10x faster than the
standard 'if char in string' test! So - here's the code plus sample
timings:

'''
Translate Speed Test

This code looks for invalid characters in a string,
and raises an exception when it finds one.
I'm testing 2 methods: the 'if char in string' method,
and one based on using the 'translate' function and
comparing string lengths afterwards.
Wow, what a difference! Translate is over 10x faster!

     Function       Loops       Seconds   Loops/sec   
     ***********************************************
     In                10000      0.171       58479   
     Translate         10000      0.016      624998       

'''

import mytime
import string


_allchars = None
_deletechars = None
_validchars = string.ascii_letters + string.digits + \
              "!@#$%^&*()`~-_=+[{]}\\|;:\'\",<.>/?\t "


def init():
    global _allchars, _deletechars
    l = []
    a = []
    for i in range(256):
        a.append(chr(i))
        if not chr(i) in _validchars:
            l.append(chr(i))
    _deletechars = ''.join(l)
    _allchars = ''.join(a)


def test():
    max = 10000
    tmr = mytime.Timer()
    r = range(max)
    s = "This is a string to test for invalid characters."
    print tmr.heading
    
    tmr.startit()
    for i in r:
        for c in s:
            if c in _deletechars:
                raise Exception("Invalid character found!")
    tmr.stopit(max)
    print tmr.results('In')
    
    tmr.startit()
    for i in r:
        s2 = s.translate(_allchars, _deletechars)
        if len(s2) != len(s):
            raise Exception("Invalid character found!")
    tmr.stopit(max)
    print tmr.results('Translate')


init()

if __name__ == "__main__":
    test()



More information about the Python-list mailing list