Help beautify ugly heuristic code

Stuart D. Gathman stuart at bmsi.com
Wed Dec 8 19:11:24 EST 2004


On Wed, 08 Dec 2004 18:39:15 -0500, Lonnie Princehouse wrote:

> Regular expressions.
> 
> It takes a while to craft the expressions, but this will be more
> elegant, more extensible, and considerably faster to compute (matching
> compiled re's is fast).

I'm already doing that with the rehmac regex.  I like your idea for making
it more readable, though.  Looking for permutations of the IP address
gives much more bang for the line of code than most host only regexes
since it is ISP independent.  At least one ISP uses roman numerals to code
the IP for their dynamic addresses!  I tried matching a custom regex
computed from the IP, but compiling the regex for each test was too slow.

I could keep adding more patterns, but I was hoping for a tool that
"learns" from a database of preclassified examples how to recognize the
pattern.  And the resulting data would be reasonably compact.  I don't ask
for much, do I?  A Bayesian classifier would have too big of a database, I
think.  I've seen neural nets do amazing things with only 100 or so
neurons - a small weight database. But they are slow in software.

I have posted 10K preclassified (by current algorithm) examples here:
http://bmsi.com/python/dynip.samp



More information about the Python-list mailing list