[Spambayes] sharing split database

Brad Clements bkc at murkworks.com
Tue May 20 15:47:34 EDT 2003


On 20 May 2003 at 11:04, bill parducci wrote:

> that said, such design constraints make the shared db idea questionable WRT
> size savings. (as pointed out earlier, but it took me a while to fully
> grasp :o)

There are I guess two privacy issues at play here

1. collecting word statistics from members of this group to see just how much overlap 
there is among users. 

2. privacy after deployment.

In the first case, since I'm only looking to determine the amount of overlap and 
disjointness, I don't need the actual words, I could use a hash of each word. Sure, 
the upload will be huge, but maybe not too bad after running through gzip

In the second case.. We might not get that far if there's not enough overlap to make 
it pay off. ;-)




-- 
Brad Clements,                bkc at murkworks.com   (315)268-1000
http://www.murkworks.com                          (315)268-9812 Fax
http://www.wecanstopspam.org/                   AOL-IM: BKClements




More information about the Spambayes mailing list