[Spambayes] Results of playing with CDB

Guido van Rossum guido@python.org
Sun, 15 Sep 2002 02:16:56 -0400


> [Neale Pickett]
> > ...
> > I'm not sure a server would work for multiple users.  I have 80 users;
> > at 20MB per server, that's 1600MB of RAM, which would require a new hard
> > drive just for the swap :).  I'm sure it wouldn't be quite that bad if
> > it was just one server for everyone, but in that case I suspect the dict
> > would quickly become so large that it would no longer be practical to
> > have it all in RAM, and then I'm back to dbm files.
> >
> > Of course, I could be missing some obvious implementation detail.  The
> > spam database could be shared, for instance, if it was sufficiently
> > attack-resistant.  In any case, speeding up the current implementation
> > seems like a sufficiently low-hanging fruit for the time being, and
> > hopefully will be useful work for a larger project.

[Tim]
> Hmm.  When you say "a server", you seem to have in mind a shared machine
> sitting in another room.  When I say "a server", I just have in mind a
> process that usually sits idle, waiting for a client to ask it to do
> something.  There's no rule against a server and client process running on
> the same box, neither against each user running their own server process for
> their own email.

That's not how I read Neale's post.  He's considering one server
*process* per user though, and thinks that 80 server processes with a
virtual memory address space (what he calls "RAM") of 20 MB would take
up too much swap space.  Is 20 MB a reasonable estimate for the
in-memory dict?

But I don't buy his assumption that if you share the server between 80
users the dict becomes too large.  Why would it?  Because their ham
collections would be disjoint?  Not completely, I expect.  But that is
indeed a question to ponder: can 80 users effectively share the
classifier database?

--Guido van Rossum (home page: http://www.python.org/~guido/)