re.compile for names

brad byte8bits at gmail.com
Mon May 21 10:09:57 EDT 2007


Marc 'BlackJack' Rintsch wrote:

> What about names with letters not in the ASCII range?

Like Asian names? The names we encounter are spelled out in English... 
like Xu, Zu, Li-Cheng, Matsumoto, Wantanabee, etc. So the ASCII approach 
would still work. I guess.

My first thought was to spell out names entirely, but that quickly 
seemed a bad idea. Doing an re on smith with whitespace boundaries is 
more accurate than smi w/o, but the volume of names just makes it 
impossible. And the volume of false positives using only smi makes it 
somewhat worthless too.

It's tough when a problem needs an accurate yet broad solution. Too 
broad and the results are irrelevant as they'll include so many false 
positives, too accurate and the results will be missing a few names. 
It's a no-win :(

Thanks for the advice.

Brad



More information about the Python-list mailing list