[Spambayes] Re: There Can Be Only One
Tim Peters
tim.one@comcast.net
Wed, 25 Sep 2002 23:22:16 -0400
[Tim]
>> Does enabling mine_received_headers alone also hurt you?
[Anthony Baxter]
> Yep. I'm thinking about finding my older received processing code from
> where-ever it ended up - all it pulled out were hostnames and IP
> addresses.
Be sure to read the existing code before you bother: unless I'm horribly
mistaken, that's all Neil's mine_received_headers pulls out:
received_host_re = re.compile(r'from (\S+)\s')
received_ip_re = re.compile(r'\s[[(]((\d{1,3}\.?){4})[\])]')
def breakdown_host(host):
parts = host.split('.')
for i in range(1, len(parts) + 1):
yield '.'.join(parts[-i:])
def breakdown_ipaddr(ipaddr):
parts = ipaddr.split('.')
for i in range(1, 5):
yield '.'.join(parts[:i])
if options.mine_received_headers:
for header in msg.get_all("received", ()):
for pat, breakdown in [(received_host_re, breakdown_host),
(received_ip_re, breakdown_ipaddr)]:
m = pat.search(header)
if m:
for tok in breakdown(m.group(1).lower()):
yield 'received:' + tok
So it's tokenizing hosts and IPs, and suffixes of hosts, and prefixes of
IPs, and that's it.