[Spambayes-checkins] spambayes/spambayes tokenizer.py,1.43,1.44
Skip Montanaro
montanaro at users.sourceforge.net
Mon Aug 7 04:47:13 CEST 2006
Update of /cvsroot/spambayes/spambayes/spambayes
In directory sc8-pr-cvs8.sourceforge.net:/tmp/cvs-serv10981
Modified Files:
tokenizer.py
Log Message:
In splicing back several changes one-by-one I completely left out the code
to handle x-lookup_ip... That would explain why my testing today didn't
show any improvement!
Also, tweak image-size to only yield a single token, and only if there is at
least one decodable image.
Index: tokenizer.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/spambayes/tokenizer.py,v
retrieving revision 1.43
retrieving revision 1.44
diff -C2 -d -r1.43 -r1.44
*** tokenizer.py 6 Aug 2006 20:55:10 -0000 1.43
--- tokenizer.py 7 Aug 2006 02:47:10 -0000 1.44
***************
*** 1085,1088 ****
--- 1085,1103 ----
scheme, netloc, path, params, query, frag = urlparse.urlparse(url)
+ if cache is not None and options["Tokenizer", "x-lookup_ip"]:
+ ips=cache.lookup(netloc)
+ if len(ips)==0:
+ pushclue("url-ip:timeout")
+ else:
+ for ip in ips: # Should we limit to one A record?
+ pushclue("url-ip:%s/32" % ip)
+ dottedQuadList=ip.split(".")
+ pushclue("url-ip:%s/8" % dottedQuadList[0])
+ pushclue("url-ip:%s.%s/16" % (dottedQuadList[0],
+ dottedQuadList[1]))
+ pushclue("url-ip:%s.%s.%s/24" % (dottedQuadList[0],
+ dottedQuadList[1],
+ dottedQuadList[2]))
+
# one common technique in bogus "please (re-)authorize yourself"
# scams is to make it appear as if you're visiting a valid
***************
*** 1605,1608 ****
--- 1620,1624 ----
# each image.
+ total_len = 0
for part in parts:
try:
***************
*** 1612,1621 ****
text = part.get_payload(decode=False)
if text is None:
yield "control: image payload is None"
- continue
! if text:
! yield "image-size:2**%d" % round(log2(len(text)))
if options["Tokenizer", "x-crack_images"]:
--- 1628,1637 ----
text = part.get_payload(decode=False)
+ total_len += len(text or "")
if text is None:
yield "control: image payload is None"
! if total_len:
! yield "image-size:2**%d" % round(log2(total_len))
if options["Tokenizer", "x-crack_images"]:
More information about the Spambayes-checkins
mailing list