Finding Peoples' Names in Files

brad byte8bits at gmail.com
Thu Oct 11 15:50:00 EDT 2007


Chris Mellon wrote:

> In case you're doing this for PCI validation, be aware that just the
> CC number is considered sensitive and you'd get some false negatives
> if you filter on anything except that.
> 
> Random strings that match CC checksums are really quite rare and false
> positives from that alone are unlikely to be a problem. Unless I
> deployed this and there was a significant false positive rate I
> wouldn't risk the false negatives, personally.

Yes, it is for PCI. Our rate of false positives is low, very low. I 
wasn't aware that a number alone was a PCI violation. Thank you! On 
another note, we're a university (Virginia Tech) and we're subject to 
FERPA, HIPPA, GLBA, etc... in addition to PCI. So we do these checks for 
U.S. Social Security Numbers too in an effort to prevent or lessen the 
chance of ID theft. Unfortunately, there is no luhn check for SSNs. We 
follow the Social Security Administration verification guideline 
religiously... here's an web front-end to my logic:

http://black.cirt.vt.edu/public/valid_ssn/index.html

but still have many false positives on SSNs, so being able to id *names 
and numbers* in files would still be a be benefit to us.

Brad



More information about the Python-list mailing list