[Tutor] Organizing 15500 records, how?

Peter Jessop pjlists at gmail.com
Wed Dec 13 08:59:17 CET 2006


With more than 15000 records you would be better off using a relational
database.
Although it will create more work to start with (you'll have to learn it),
it will save you a lot of work in the medium and long term.

Almost any relational database can be accessed from python.As it is just for
your own use SQLite might be the most appropiate (it has a very small
footprint) but MySQL is excellent and so are many others.

To use a relational database you might think about learning SQL. It is very
easy (especially if you you know any Boolean algebra) and is a language that
has been used almost unchanged for decades and shows every sign of staying
here for a long time. In computing it is one of the most useful things you
can learn. There is a good introductory, interactive tutorial
athttp://sqlcourse.com/

If you feel you need another abstraction layer on top of this you could look
at SQLObject <http://www.sqlobject.org/>.

Personally I would recommend that you start with MySQL<http://www.mysql.com>.
It is open source, easy to install and use, stable and fast.  But with SQL
motors you have lots of good choices.

Peter Jessop


On 12/13/06, Thomas <tavspam at gmail.com> wrote:
> I'm writing a program to analyse the profiles of the 15500 users of my
> forum. I have the profiles as html files stored locally and I'm using
> ClientForm to extract the various details from the html form in each
> file.
>
> My goal is to identify lurking spammers but also to learn how to
> better spot spammers by calculating statistical correlations in the
> data against known spammers.
>
> I need advise with how to organise my data. There are 50 fields in
> each profile, some fields will be much more use than others so I
> though about creating say 10 files to start off with that contained
> dictionaries of userid to field value. That way I'm dealing with 10 to
> 50 files instead of 15500.
>
> Also, I am inexperienced with using classes but eager to learn and
> wonder if they would be any help in this case.
>
> Any advise much appreciated and thanks in advance,
> Thomas
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20061213/a1891736/attachment.htm 


More information about the Tutor mailing list