Sequence and/or pattern matching

Ben Sizer kylotan at gmail.com
Wed Oct 19 04:58:50 EDT 2005


Séb wrote:
> 1) I have a list of connexion between some computers. This list has
> this format :

It looks like you want graph theory.

> Ip A               Date             Ip B
> ...                ...              ...
> 192.168.0.1        19.10.2005       192.168.0.2
> 192.168.0.3        19.10.2005       192.168.0.1
> 192.168.0.4        19.10.2005       192.168.0.6

That looks like a list of edges between graph nodes, you see. Each node
corresponds to an address and each edge is a connection between two
nodes - ip addresses, in your case.

> 2) I want to find if there are unknown sequences of connexions in my
> data and if these sequences are repeated along the file :
>
> For example :
>
> Computer A connects to Computer B then
> Computer B connects to Computer C then
> Computer C connects to Computer A

That looks like finding a path between Node A and Node C. This is a
common application of graph theory, and especially when finding routes
(eg. for train journeys, or for AI characters in computer games).

> 3) Then, the software gives the sequences it has found and how many
> times they appear...

You can find all the routes between 1 node and others by using
depth-first search (or indeed, any other simple graph search
algorithm). Basically, this says that, for any node, examine all the
nodes it leads to. Then examine those nodes... repeat until you run out
of nodes or find where you're looking for. The only complication is
remembering the route you took.

> I hope this is clear, point 2) is where I have my main problem. Has
> someone an idea where to start and if there's already something coded ?

I found a couple of Python graph libraries on Google but I can't vouch
for their quality. However, writing your own simple graph traversal
algorithm should be quite trivial. I would start by populating a
dictionary where the keys are instances of address A and the values are
lists of address B values for address A. Add each link again with
address B as the key and address A as the value if you need to
represent bidirectional connections. Then you can perform search on
that as required. The only slight complication is avoiding infinite
loops - I might use a dictionary of address->boolean values here and
check off each address as the algorithm progresses.

-- 
Ben Sizer




More information about the Python-list mailing list