Finding Peoples' Names in Files

Matimus mccredie at gmail.com
Thu Oct 11 13:26:21 EDT 2007


On Oct 11, 10:02 am, byte8b... at gmail.com wrote:
> On Oct 11, 12:49 pm, Matimus <mccre... at gmail.com> wrote:
>
>
>
> > On Oct 11, 9:11 am, brad <byte8b... at gmail.com> wrote:
>
> > > cokofree... at gmail.com wrote:
> > > > However...how can you know it is a name...
>
> > > OK, I admitted in my first post that it was a crazy question, but if one
> > > could find an answer, one would be onto something. Maybe it's not a 100%
> > > answerable question, but I would guess that it is an 80% answerable
> > > question... I just don't know how... yet :)
>
> > > Besides admitting that it's a crazy question, I should stop and explain
> > > how it would be useful to me at least. Is a credit card number itself
> > > valuable? I would think not. One can easily re and luhn check for credit
> > > card numbers located in files with a great degree of accuracy, but a
> > > number without a name is not very useful to me. So, if one could
> > > associate names to luhn checked numbers automatically, then one would be
> > > onto something. Or at least say, "hey, this file has luhn validated CCs
> > > *AND* it seems to have people's names in it as well." Now then, I'd have
> > > less to review or perhaps as much as I have now, but I could push the
> > > files with numbers and names to the top of the list so that they would
> > > be reviewed first.
>
> > > Brad
>
> > What the hell are you doing? Your post sounds to me like you have a
> > huge amount of stolen, or at the very least misapprehended, data. Now
> > you want to search it for credit card numbers and names so that you
> > can use them.
>
> > I am not cool with this! This is a public forum about a programming
> > language. What makes you think that anybody in this forum will be cool
> > with that. Perhaps you aren't doing anything illegal, but it sure is
> > coming off that way. If you are doing something illegal I hope you get
> > caught.
>
> > At the very least, you might want to clarify why you are looking for
> > such capability so that you don't get effectively black-listed (well,
> > by me at least).
>
> > Matt
>
> Go have a beer and calm down a bit :) It's a legitimate purpose,
> although it could (and probably is being used by bad guys right now).
> My intent, as you can see from the links below, is to catch it before
> the bad guys do.
>
> http://filebox.vt.edu/users/rtilley/public/find_ccns/http://filebox.vt.edu/users/rtilley/public/find_ssns/
>
> Brad

Its just past 10:00 am where I am... I know customs vary, but
generally beer before lunch is frowned upon :). I know the tone of
posts does not carry well over the web, but I was really just trying
to point out that your previous post sounded very shady, and at the
very least some clarification was in order. I wasn't standing on my
desk frothing at the mouth or anything.

On to my suggestion. I think you are going to have to use statistical
analysis. That is, you won't get something that reliably returns a
boolean, but maybe something that says there is a 75% chance that
there are names in a given file. You can't know that a given string is
or isn't a name, you can only know that it is probably a name based
upon how often it is used in that context. Either way this isn't a
simple problem to solve, and it probably involves creating a database
of words that shows what percentage of the time they are used as
names. How such a database is created... that is the hard part. There
may be tools out there for such analasys, but that isn't an area I
have any experience in.

Matt




More information about the Python-list mailing list