re.compile for names
brad
byte8bits at gmail.com
Mon May 21 10:09:57 EDT 2007
Marc 'BlackJack' Rintsch wrote:
> What about names with letters not in the ASCII range?
Like Asian names? The names we encounter are spelled out in English...
like Xu, Zu, Li-Cheng, Matsumoto, Wantanabee, etc. So the ASCII approach
would still work. I guess.
My first thought was to spell out names entirely, but that quickly
seemed a bad idea. Doing an re on smith with whitespace boundaries is
more accurate than smi w/o, but the volume of names just makes it
impossible. And the volume of false positives using only smi makes it
somewhat worthless too.
It's tough when a problem needs an accurate yet broad solution. Too
broad and the results are irrelevant as they'll include so many false
positives, too accurate and the results will be missing a few names.
It's a no-win :(
Thanks for the advice.
Brad
More information about the Python-list
mailing list