[Tutor] Organizing 15500 records, how?

Thomas tavspam at gmail.com
Wed Dec 13 02:39:48 CET 2006


I'm writing a program to analyse the profiles of the 15500 users of my
forum. I have the profiles as html files stored locally and I'm using
ClientForm to extract the various details from the html form in each
file.

My goal is to identify lurking spammers but also to learn how to
better spot spammers by calculating statistical correlations in the
data against known spammers.

I need advise with how to organise my data. There are 50 fields in
each profile, some fields will be much more use than others so I
though about creating say 10 files to start off with that contained
dictionaries of userid to field value. That way I'm dealing with 10 to
50 files instead of 15500.

Also, I am inexperienced with using classes but eager to learn and
wonder if they would be any help in this case.

Any advise much appreciated and thanks in advance,
Thomas


More information about the Tutor mailing list