Faster 'if char in string' test

Elbert Lev elbertlev at hotmail.com
Thu Jun 24 10:42:03 EDT 2004


Very good!

One remark:

translate method is much slower then "c in string" method in the case
when the test string is very long (say 5mb) and the invalid character
is located close to the beginning of the string. (10000 times slower!)


klachemin at home.com (Kamilche) wrote in message news:<889cbba0.0406232245.53b9025e at posting.google.com>...
> I was looking for a way to speed up detecting invalid characters in my
> TCP string, and thought of yet another use for the translate function!
> If you were to 'translate out' the bad characters, and compare string
> lengths afterwards, you would know whether or not the line contained
> invalid characters. The new method is more than 10x faster than the
> standard 'if char in string' test! So - here's the code plus sample
> timings:
> 
> '''
> Translate Speed Test
> 
> This code looks for invalid characters in a string,
> and raises an exception when it finds one.
> I'm testing 2 methods: the 'if char in string' method,
> and one based on using the 'translate' function and
> comparing string lengths afterwards.
> Wow, what a difference! Translate is over 10x faster!
> 
>      Function       Loops       Seconds   Loops/sec   
>      ***********************************************
>      In                10000      0.171       58479   
>      Translate         10000      0.016      624998       
> 
> '''
> 
> import mytime
> import string
> 
> 
> _allchars = None
> _deletechars = None
> _validchars = string.ascii_letters + string.digits + \
>               "!@#$%^&*()`~-_=+[{]}\\|;:\'\",<.>/?\t "
> 
> 
> def init():
>     global _allchars, _deletechars
>     l = []
>     a = []
>     for i in range(256):
>         a.append(chr(i))
>         if not chr(i) in _validchars:
>             l.append(chr(i))
>     _deletechars = ''.join(l)
>     _allchars = ''.join(a)
> 
> 
> def test():
>     max = 10000
>     tmr = mytime.Timer()
>     r = range(max)
>     s = "This is a string to test for invalid characters."
>     print tmr.heading
>     
>     tmr.startit()
>     for i in r:
>         for c in s:
>             if c in _deletechars:
>                 raise Exception("Invalid character found!")
>     tmr.stopit(max)
>     print tmr.results('In')
>     
>     tmr.startit()
>     for i in r:
>         s2 = s.translate(_allchars, _deletechars)
>         if len(s2) != len(s):
>             raise Exception("Invalid character found!")
>     tmr.stopit(max)
>     print tmr.results('Translate')
> 
> 
> init()
> 
> if __name__ == "__main__":
>     test()



More information about the Python-list mailing list