[spambayes-dev] spoof detector

David Abrahams dave at boost-consulting.com
Sun Jul 8 17:58:58 CEST 2007


on Fri Jul 06 2007, David Abrahams <dave-UB3wUj7V41K5azolltMz9laTQe2KTcn/-AT-public.gmane.org> wrote:

> on Fri Jul 06 2007, David Abrahams <dave-UB3wUj7V41K5azolltMz9laTQe2KTcn/-AT-public.gmane.org> wrote:
>
>> on Fri Jul 06 2007, skip-AT-pobox.com wrote:
>>
>>> Try these two settings
>>>
>>>     x-pick_apart_urls:True
>>>     x-lookup_ip:True
>>>
>>> and see if they help.
>
> Oh, and these go in the [Tokenizer] section, right?
>
>> Well, they sure make training slow to a crawl!
>> Is there any effective way of cacheing those DNS lookups?
>
> I did eventually find the lookup_ip_cache option, but frankly the
> results are disappointing.  I would have expected one slow round in my
> train-to-exhaustion regime and then all following rounds to go very
> quickly, but that doesn't appear to be the case.  The first round took
> 18.5 minutes and it doesn't look like the 2nd round is going to be
> much faster.  Oh, and right now the dnscache file is 414 bytes long
> and is full of stuff that mostly doesn't look like it has any
> relevance to dns lookup.  I realize I shouldn't expect to be able to
> read a pickle by eye, but there is one string in there that looks like
> a domain name so I expect to see the others.

Well, I eventually got training to finish, but I don't notice any
improvement in accuracy.  It may even have gotten worse; I've had a
few false negatives since enabling those options, and in general I
*never* see those.

-- 
Dave Abrahams
Boost Consulting
http://www.boost-consulting.com

The Astoria Seminar ==> http://www.astoriaseminar.com



More information about the spambayes-dev mailing list