Help beautify ugly heuristic code
Lonnie Princehouse
finite.automaton at gmail.com
Thu Dec 9 03:01:36 EST 2004
Doh! I misread "a" as host instead of ip in your first post. I'm
sorry about that; I really must slow down. Anyhow,
I believe you can still do this with only compiling a regex once and
then performing a few substitutions on the hostname.
Substitutions:
1st byte of IP => (0)
2nd byte of IP => (1)
3rd byte of IP => (2)
4th byte of IP => (3)
and likewise for hex => (x0) (x1) (x2) (x3)
Each host string will possibly map into multiple expansions, esp. if a
number repeats itself in the IP, or if an IP byte is less than 10 (such
that the decimal and hex representations are the same). Zero-padded
and unpadded will both have to be substituted, and it's probably best
to not to alter the last two fields in the host name since ISPs can't
change those.
With this scheme, here are a few expansions of (ip,host) tuples:
172.182.240.186 ACB6F0BA.ipt.aol.com
becomes
(x0)(x1)(x2)(x3).ipt.aol.com
67.119.55.77 adsl-67-119-55-77.dsl.lsan03.pacbell.net
becomes
adsl-(0)-(1)-(2)-(3).dsl.lsan03.pacbell.net
adsl-(0)-(1)-(2)-(x1).dsl.lsan03.pacbell.net
81.220.220.143 ip-143.net-81-220-220.henin.rev.numericable.fr
becomes
ip-(3).net-(0)-(1)-(1).henin.rev.numericable.fr
ip-(3).net-(0)-(1)-(2).henin.rev.numericable.fr
ip-(3).net-(0)-(2)-(1).henin.rev.numericable.fr
ip-(3).net-(0)-(2)-(2).henin.rev.numericable.fr
etcetera.
Now you can run a precompiled regular expression against these hostname
permutations, i.e. ".*\(0\).*\(1\).*\(2\).*\(3\).*" would match any
host in which the IP address numbers appeared in the correct order.
There are only a handful dynamic addresses in your sample data that
don't match a decimal or hexadecimal IP-based pattern, e.g.
68.53.109.99 pcp03902856pcs.nash01.tn.comcast.net
68.147.136.167 s01060050bf91c1e4.cg.shawcable.net
More information about the Python-list
mailing list