Help beautify ugly heuristic code

Lonnie Princehouse finite.automaton at gmail.com
Thu Dec 9 03:01:36 EST 2004


Doh!  I misread "a" as host instead of ip in your first post.   I'm
sorry about that; I really must slow down.  Anyhow,

I believe you can still do this with only compiling a regex once and
then performing a few substitutions on the hostname.

Substitutions:

1st byte of IP => (0)
2nd byte of IP => (1)
3rd byte of IP => (2)
4th  byte of IP => (3)
and likewise for hex => (x0)  (x1)  (x2)  (x3)

Each host string will possibly map into multiple expansions, esp. if a
number repeats itself in the IP, or if an IP byte is less than 10 (such
that the decimal and hex representations are the same).  Zero-padded
and unpadded will both have to be substituted, and it's probably best
to not to alter the last two fields in the host name since ISPs can't
change those.

With this scheme, here are a few expansions of (ip,host) tuples:

172.182.240.186  ACB6F0BA.ipt.aol.com
becomes
(x0)(x1)(x2)(x3).ipt.aol.com

67.119.55.77     adsl-67-119-55-77.dsl.lsan03.pacbell.net
becomes
adsl-(0)-(1)-(2)-(3).dsl.lsan03.pacbell.net
adsl-(0)-(1)-(2)-(x1).dsl.lsan03.pacbell.net


81.220.220.143   ip-143.net-81-220-220.henin.rev.numericable.fr
becomes
ip-(3).net-(0)-(1)-(1).henin.rev.numericable.fr
ip-(3).net-(0)-(1)-(2).henin.rev.numericable.fr
ip-(3).net-(0)-(2)-(1).henin.rev.numericable.fr
ip-(3).net-(0)-(2)-(2).henin.rev.numericable.fr

etcetera.

Now you can run a precompiled regular expression against these hostname
permutations, i.e.  ".*\(0\).*\(1\).*\(2\).*\(3\).*" would match any
host in which the IP address numbers appeared in the correct order.

There are only a handful dynamic addresses in your sample data that
don't match a decimal or hexadecimal IP-based pattern, e.g.

68.53.109.99     pcp03902856pcs.nash01.tn.comcast.net
68.147.136.167   s01060050bf91c1e4.cg.shawcable.net




More information about the Python-list mailing list