[Spambayes] Training problem
kudret
kudret at hotpop.com
Sat Sep 1 23:09:16 CEST 2007
I found something, I guess
The below is some part of "Spam Clues" report of my friend. Top part of
tokens section is fine. It shows no sign of spam. Bottom part of it shows
some spam clues, and total is %45
Combined Score: 45% (0.4548)
Internal ham score (*H*): 1
Internal spam score (*S*): 0.909599
# ham trained on: 2559
# spam trained on: 48
40 Significant Tokens
token spamprob #ham #spam
.....
'header:MIME-Version:1' 0.622183 1522 48
'this' 0.695978 1049 46
'date:' 0.741391 801 44
'free' 0.851694 399 44
'found' 0.876964 321 44
'release' 0.887758 289 44
'checked' 0.895329 267 44
'virus' 0.899162 256 44
'message.' 0.899511 255 44
'edition.' 0.899862 254 44
'database:' 0.905145 239 44
'incoming' 0.9055 238 44
'avg' 0.905854 237 44
'version:' 0.905854 237 44
'7.5.484' 0.973684 57 44
'269.13.1/982' 0.993601 2 39
'31-aug-07' 0.993601 2 39
'5:21' 0.993601 2 39
This is an email from SpamBayes
Combined Score: 0% (0)
Internal ham score (*H*): 1
Internal spam score (*S*): 0
# ham trained on: 2559
# spam trained on: 50
150 Significant Tokens
token spamprob #ham #spam
'skip:- 10' 0.000314048 716 0
"i'm" 0.000324465 693 0
'url:html' 0.000605734 371 0
'online:' 0.000989228 227 0
'mailing' 0.00102064 220 0
'data' 0.0011169 201 0
'looking' 0.0011169 201 0
'doing' 0.00131234 171 0
'try' 0.00135993 165 0
'having' 0.0014111 159 0
'copy' 0.00161348 139 0
'files' 0.00161348 139 0
'file' 0.00173812 129 0
...
'message' 0.151436 288 1
'bug,' 0.155172 1 0
'bug.' 0.155172 1 0
'environment?' 0.155172 1 0
'outgrowth' 0.155172 1 0
'similar,' 0.155172 1 0
'subscribing' 0.155172 1 0
'this!' 0.155172 1 0
'from:no real name:2**0' 0.832851 112 11
'to:no real name:2**0' 0.847284 331 36
'free' 0.849132 399 44
'outlook' 0.871205 6 1
'found' 0.874777 321 44
'release' 0.88574 289 44
'subject:list' 0.892519 1 1
'checked' 0.893433 267 44
'subject:your' 0.895863 34 6
'skip:- 20' 0.896145 10 2
'virus' 0.897327 256 44
'message.' 0.897683 255 44
'edition.' 0.898039 254 44
'database:' 0.90341 239 44
'incoming' 0.90377 238 44
'avg' 0.904131 237 44
'version:' 0.904131 237 44
'7.5.484' 0.973205 57 44
'269.13.1/982' 0.993582 2 39
'31-aug-07' 0.993582 2 39
'5:21' 0.993582 2 39
How is that possible 2 similar token list, and one of them gets %45, the
other is %0 ?
--
View this message in context: http://www.nabble.com/Training-problem-tf4365445.html#a12444168
Sent from the Spambayes - General mailing list archive at Nabble.com.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes/attachments/20070901/e810cf7c/attachment.htm
More information about the SpamBayes
mailing list