re.compile for names

brad byte8bits at gmail.com
Mon May 21 09:46:33 EDT 2007


I am developing a list of 3 character strings like this:

and
bra
cam
dom
emi
mar
smi
...

The goal of the list is to have enough strings to identify files that 
may contain the names of people. Missing a name in a file is unacceptable.

For example, the string 'mar' would get marc, mark, mary, maria... 'smi' 
would get smith, smiley, smit, etc. False positives are OK (getting 
common words instead of people's names is OK).

I may end up with a thousand or so of these 3 character strings. Is that 
too much for an re.compile to handle? Also, is this a bad way to 
approach this problem? Any ideas for improvement are welcome!

I can provide more info off-list for those who would like.

Thank you for your time,
Brad



More information about the Python-list mailing list