Faster 'if char in string' test
Elbert Lev
elbertlev at hotmail.com
Thu Jun 24 10:42:03 EDT 2004
Very good!
One remark:
translate method is much slower then "c in string" method in the case
when the test string is very long (say 5mb) and the invalid character
is located close to the beginning of the string. (10000 times slower!)
klachemin at home.com (Kamilche) wrote in message news:<889cbba0.0406232245.53b9025e at posting.google.com>...
> I was looking for a way to speed up detecting invalid characters in my
> TCP string, and thought of yet another use for the translate function!
> If you were to 'translate out' the bad characters, and compare string
> lengths afterwards, you would know whether or not the line contained
> invalid characters. The new method is more than 10x faster than the
> standard 'if char in string' test! So - here's the code plus sample
> timings:
>
> '''
> Translate Speed Test
>
> This code looks for invalid characters in a string,
> and raises an exception when it finds one.
> I'm testing 2 methods: the 'if char in string' method,
> and one based on using the 'translate' function and
> comparing string lengths afterwards.
> Wow, what a difference! Translate is over 10x faster!
>
> Function Loops Seconds Loops/sec
> ***********************************************
> In 10000 0.171 58479
> Translate 10000 0.016 624998
>
> '''
>
> import mytime
> import string
>
>
> _allchars = None
> _deletechars = None
> _validchars = string.ascii_letters + string.digits + \
> "!@#$%^&*()`~-_=+[{]}\\|;:\'\",<.>/?\t "
>
>
> def init():
> global _allchars, _deletechars
> l = []
> a = []
> for i in range(256):
> a.append(chr(i))
> if not chr(i) in _validchars:
> l.append(chr(i))
> _deletechars = ''.join(l)
> _allchars = ''.join(a)
>
>
> def test():
> max = 10000
> tmr = mytime.Timer()
> r = range(max)
> s = "This is a string to test for invalid characters."
> print tmr.heading
>
> tmr.startit()
> for i in r:
> for c in s:
> if c in _deletechars:
> raise Exception("Invalid character found!")
> tmr.stopit(max)
> print tmr.results('In')
>
> tmr.startit()
> for i in r:
> s2 = s.translate(_allchars, _deletechars)
> if len(s2) != len(s):
> raise Exception("Invalid character found!")
> tmr.stopit(max)
> print tmr.results('Translate')
>
>
> init()
>
> if __name__ == "__main__":
> test()
More information about the Python-list
mailing list