Faster 'if char in string' test

Peter Otten __peter__ at web.de
Thu Jun 24 12:12:22 EDT 2004


Kamilche wrote:

> he 'dict' solution proposed wouldn't work because the data I'm
> testing is in string form, and the overhead of translating the string
> to a dict before testing would swamp the results. So would using a
> set, because timings show a set is 4x slower than a dict.
> 
> Unless I'm misunderstanding Peter's suggestion. Did you mean
> translating the string into a dict, then using a 'if boguschar in
> dict' test?

Look for validate_dict() to see what I meant. I've since found a faster
alternative using regular expressions.

<code>
import itertools, re, string

_validchars = string.ascii_letters + string.digits + \
              "!@#$%^&*()`~-_=+[{]}\\|;:\'\",<.>/?\t "
_allchars = string.maketrans("", "")
_invalidchars = _allchars.translate(_allchars, _validchars)

_invaliddict = dict(itertools.izip(_invalidchars, itertools.repeat(None)))

valid = "This is a string to test for invalid characters."
invalid = valid + _invalidchars + valid
massaged = (_invalidchars + valid) * 100

rInvalid = re.compile("[%s]" % re.escape(_invalidchars))

def validate_list(s, invalid=_invalidchars):
    for c in s:
        if c in invalid:
            return False
    return True

def validate_dict(s, invalid=_invaliddict):
    for c in s:
        if c in invalid:
            return False
    return True

def validate_translate(s):
    return len(s.translate(_allchars, _invalidchars)) == len(s)

def validate_regex(s, rInvalid=rInvalid):
    return not rInvalid.search(s)

def validate_noop(s, valid=valid):
    return s is valid


if __name__ == "__main__":
    import timeit
    tests = [f for k, f in globals().items() if k.startswith("validate_")]
    for f in tests:
        assert f(valid)
        assert not f(invalid)

    for f in tests:
        name = f.__name__
        print name
        for data in ["valid", "invalid", "massaged"]:
            print " ", data,
            timeit.main([
            "-sfrom findcharspeed import %s as f, %s as d" % (name, data),
            "f(d)"])


</code>

Here are my timings:

validate_regex
  valid 100000 loops, best of 3: 2.24 usec per loop
  invalid 100000 loops, best of 3: 2.37 usec per loop
  massaged 1000000 loops, best of 3: 1.61 usec per loop
validate_dict
  valid 100000 loops, best of 3: 10.8 usec per loop
  invalid 100000 loops, best of 3: 10.7 usec per loop
  massaged 1000000 loops, best of 3: 0.99 usec per loop
validate_translate
  valid 100000 loops, best of 3: 2.62 usec per loop
  invalid 100000 loops, best of 3: 3.68 usec per loop
  massaged 10000 loops, best of 3: 57.6 usec per loop
validate_list
  valid 100000 loops, best of 3: 14.2 usec per loop
  invalid 100000 loops, best of 3: 14 usec per loop
  massaged 1000000 loops, best of 3: 1.02 usec per loop
validate_noop
  valid 1000000 loops, best of 3: 0.582 usec per loop
  invalid 1000000 loops, best of 3: 0.622 usec per loop
  massaged 1000000 loops, best of 3: 0.634 usec per loop

Note how badly your solution using str.translate() may perform for certain
data due to its failure to shortcut.

Peter





More information about the Python-list mailing list