[Spambayes] Results of playing with CDB
Guido van Rossum
guido@python.org
Sun, 15 Sep 2002 02:16:56 -0400
> [Neale Pickett]
> > ...
> > I'm not sure a server would work for multiple users. I have 80 users;
> > at 20MB per server, that's 1600MB of RAM, which would require a new hard
> > drive just for the swap :). I'm sure it wouldn't be quite that bad if
> > it was just one server for everyone, but in that case I suspect the dict
> > would quickly become so large that it would no longer be practical to
> > have it all in RAM, and then I'm back to dbm files.
> >
> > Of course, I could be missing some obvious implementation detail. The
> > spam database could be shared, for instance, if it was sufficiently
> > attack-resistant. In any case, speeding up the current implementation
> > seems like a sufficiently low-hanging fruit for the time being, and
> > hopefully will be useful work for a larger project.
[Tim]
> Hmm. When you say "a server", you seem to have in mind a shared machine
> sitting in another room. When I say "a server", I just have in mind a
> process that usually sits idle, waiting for a client to ask it to do
> something. There's no rule against a server and client process running on
> the same box, neither against each user running their own server process for
> their own email.
That's not how I read Neale's post. He's considering one server
*process* per user though, and thinks that 80 server processes with a
virtual memory address space (what he calls "RAM") of 20 MB would take
up too much swap space. Is 20 MB a reasonable estimate for the
in-memory dict?
But I don't buy his assumption that if you share the server between 80
users the dict becomes too large. Why would it? Because their ham
collections would be disjoint? Not completely, I expect. But that is
indeed a question to ponder: can 80 users effectively share the
classifier database?
--Guido van Rossum (home page: http://www.python.org/~guido/)