[Spambayes] Updated test results
skip at pobox.com
skip at pobox.com
Tue Aug 8 06:10:15 CEST 2006
I picked through my new training database, found one or two outright
mistakes, deleted a few other administrative mails, fixed a few bugs in my
recent checkins and rebalanced my database. I then made a baseline run with
the following settings:
[globals]
verbose: True
[Headers]
include_evidence: True
[Tokenizer]
record_header_absence: True
summarize_email_prefixes: True
summarize_email_suffixes: True
mine_received_headers:True
x-pick_apart_urls:True
x-fancy_url_recognition:False
x-lookup_ip:False
lookup_ip_cache:~/src/spambayes/ip.pickle
x-short_runs:False
x-image_size:False
x-crack_images:False
x-max_image_size:100000
[Categorization]
ham_cutoff: 0.15
spam_cutoff: 0.50
[Storage]
persistent_storage_file: ~/src/spambayes/test.pickle
persistent_use_database: pickle
followed by a series of test runs, each one with one of the following
options set to True:
x-lookup_ip
x-short_runs
x-image_size
x-crack_images
All tests were run against the same combination of ham and spam:
-> <stat> tested 459 hams & 359 spams against 1836 hams & 1436 spams
-> <stat> tested 459 hams & 359 spams against 1836 hams & 1436 spams
-> <stat> tested 459 hams & 359 spams against 1836 hams & 1436 spams
-> <stat> tested 459 hams & 359 spams against 1836 hams & 1436 spams
-> <stat> tested 459 hams & 359 spams against 1836 hams & 1436 spams
-> <stat> tested 459 hams & 359 spams against 1836 hams & 1436 spams
-> <stat> tested 459 hams & 359 spams against 1836 hams & 1436 spams
-> <stat> tested 459 hams & 359 spams against 1836 hams & 1436 spams
-> <stat> tested 459 hams & 359 spams against 1836 hams & 1436 spams
-> <stat> tested 459 hams & 359 spams against 1836 hams & 1436 spams
baseline vs. x-lookup_ip:
false positive percentages
0.000 0.000 tied
0.000 0.000 tied
0.218 0.218 tied
0.000 0.000 tied
0.000 0.000 tied
won 0 times
tied 5 times
lost 0 times
false negative percentages
2.228 1.671 won -25.00%
3.343 3.064 won -8.35%
5.292 4.735 won -10.53%
4.735 4.457 won -5.87%
2.786 2.507 won -10.01%
won 5 times
tied 0 times
lost 0 times
baseline vs. x-short_runs:
false positive percentages
0.000 0.000 tied
0.000 0.000 tied
0.218 0.218 tied
0.000 0.000 tied
0.000 0.000 tied
won 0 times
tied 5 times
lost 0 times
false negative percentages
2.228 2.228 tied
3.343 3.343 tied
5.292 5.292 tied
4.735 4.735 tied
2.786 2.786 tied
won 0 times
tied 5 times
lost 0 times
baseline vs. x-image_size:
false positive percentages
0.000 0.000 tied
0.000 0.000 tied
0.218 0.218 tied
0.000 0.000 tied
0.000 0.000 tied
won 0 times
tied 5 times
lost 0 times
false negative percentages
2.228 1.950 won -12.48%
3.343 3.343 tied
5.292 5.014 won -5.25%
4.735 4.457 won -5.87%
2.786 2.786 tied
won 3 times
tied 2 times
lost 0 times
baseline vs. x-crack_image:
false positive percentages
0.000 0.000 tied
0.000 0.000 tied
0.218 0.218 tied
0.000 0.000 tied
0.000 0.000 tied
won 0 times
tied 5 times
lost 0 times
false negative percentages
2.228 1.671 won -25.00%
3.343 3.064 won -8.35%
5.292 4.457 won -15.78%
4.735 4.457 won -5.87%
2.786 2.786 tied
won 4 times
tied 1 times
lost 0 times
Based on the mixture of ham and spam I have it would appear only the
x-short_runs option doesn't help discriminate ham from spam.
Skip
More information about the SpamBayes
mailing list