[Spambayes] sharing split database
Brad Clements
bkc at murkworks.com
Tue May 20 15:47:34 EDT 2003
On 20 May 2003 at 11:04, bill parducci wrote:
> that said, such design constraints make the shared db idea questionable WRT
> size savings. (as pointed out earlier, but it took me a while to fully
> grasp :o)
There are I guess two privacy issues at play here
1. collecting word statistics from members of this group to see just how much overlap
there is among users.
2. privacy after deployment.
In the first case, since I'm only looking to determine the amount of overlap and
disjointness, I don't need the actual words, I could use a hash of each word. Sure,
the upload will be huge, but maybe not too bad after running through gzip
In the second case.. We might not get that far if there's not enough overlap to make
it pay off. ;-)
--
Brad Clements, bkc at murkworks.com (315)268-1000
http://www.murkworks.com (315)268-9812 Fax
http://www.wecanstopspam.org/ AOL-IM: BKClements
More information about the Spambayes
mailing list