[Spambayes] Re: There Can Be Only One

Tim Peters tim.one@comcast.net
Wed, 25 Sep 2002 23:22:16 -0400


[Tim]
>> Does enabling mine_received_headers alone also hurt you?

[Anthony Baxter]
> Yep. I'm thinking about finding my older received processing code from
> where-ever it ended up - all it pulled out were hostnames and IP
> addresses.

Be sure to read the existing code before you bother:  unless I'm horribly
mistaken, that's all Neil's mine_received_headers pulls out:

received_host_re = re.compile(r'from (\S+)\s')
received_ip_re = re.compile(r'\s[[(]((\d{1,3}\.?){4})[\])]')

def breakdown_host(host):
    parts = host.split('.')
    for i in range(1, len(parts) + 1):
        yield '.'.join(parts[-i:])

def breakdown_ipaddr(ipaddr):
    parts = ipaddr.split('.')
    for i in range(1, 5):
        yield '.'.join(parts[:i])

if options.mine_received_headers:
    for header in msg.get_all("received", ()):
        for pat, breakdown in [(received_host_re, breakdown_host),
                               (received_ip_re, breakdown_ipaddr)]:
            m = pat.search(header)
            if m:
                for tok in breakdown(m.group(1).lower()):
                    yield 'received:' + tok

So it's tokenizing hosts and IPs, and suffixes of hosts, and prefixes of
IPs, and that's it.