An isalpha() that accepts underscores as well

Wed Mar 1 05:49:43 EST 2006

In the discussion about isalpha()_mutants that accept 
underscores as well, we did not talk about regular expressions.

Afterwards I did some timings.
My first observation was that the whole experiment is rather futile,
because it takes only about a second to do a million tests.
If you take the trouble to collect a million words,
you might as well spend an extra second to analyze them.

Apart from that, a simple regular expression is often faster
than a test with replace. The last one, replace, does better
with shorter tokens without underscores. Nothing to replace.
Regular expressions are less sensitive to the length of the tokens.
Regular expressions are not monsters of inefficiency.

This is my script:

#!/usr/bin/env python
import sys
from timeit import Timer

import re
pat  = re.compile(r'^[a-zA-Z_]+$')

if len(sys.argv) > 1:
    token = sys.argv[1]
else:
    token = "contains_underscore"

t = Timer("''.join(token.split('_')).isalpha()", "from __main__ import token")
print t.timeit()  # 1.94

t = Timer("token.replace('_','X').isalpha()", "from __main__ import token")
print t.timeit()  # 1.36

t = Timer("pat.search(token)", "from __main__ import token, pat")
print t.timeit()  # 1.18

t = Timer("token.isalpha()", "from __main__ import token")
print t.timeit()  # 0.28

#egbert

-- 
Egbert Bouwman - Keizersgracht 197 II - 1016 DS  Amsterdam - 020 6257991
========================================================================