Bounding box on clusters in a 2D list

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Sat Apr 23 21:15:34 EDT 2005


I hope this is what you need, sometimes understanding the question is
one of the hardest parts :-)

If you can use a graph data structure, you can create a graph, and then
you can find the lenght of all its connected components (indentations
reduced to 2 spaces):

. def mat2graph(g, m, good=None, w=1):
. 	"Function docs..."
.   g.clear()
.   if m and isinstance(m, (list, tuple)):
.     if good != None:
.       m = [[good(e) for e in row] for row in m]
.     maxy = len(m)-1
.     maxx = len(m[0])-1
.     add = g.addBiArc
.     for y, row in enumerate(m):
.       for x, e in enumerate(row):
.         if e:
.           g.addNode((x,y)) # Necessary for isolated nodes.
.           if x>0 and m[y][x-1]: add( (x,y), (x-1,y), w=w)
.           if x<maxx and m[y][x+1]: add( (x,y), (x+1,y), w=w)
.           if y>0 and m[y-1][x]: add( (x,y), (x,y-1), w=w)
.           if y<maxy and m[y+1][x]: add( (x,y), (x,y+1), w=w)
.
. from graph import Graph
. g = Graph()
. m = [[1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0],
.      [1,1,1,0,1,1,0,0,0,0,0,1,1,1,1,0,0,0],
.      [1,1,1,0,1,1,1,0,1,1,0,1,1,1,0,0,0,0],
.      [0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0]]
. mat2graph(g, m)
. print map(len, g.connectedComponents())

Something similar to mat2graph will probably become a method of Graph.
A quite similar program can be used for other python graph libraries.

If you cannot use a graph data structure, of if you need to go faster,
you can use a flood fill algorithm tuned for this problem:


. def connected(m, foreground=1, background=0):
.   def floodFill4(m, r,c, newCol, oldCol):
.     """Simplest 4-connected flood fill algorithm, there are much
.     faster non-recursive ones. I've modified this from:
.     http://www.student.kuleuven.ac.be/~m0216922/CG/floodfill.html"""
.     if c>=0 and c<maxc and r>=0 and r<maxr and m[r][c]==oldCol:
.       m[r][c] = newCol
.       floodFill4(m, r+1, c, newCol, oldCol)
.       floodFill4(m, r-1, c, newCol, oldCol)
.       floodFill4(m, r, c+1, newCol, oldCol)
.       floodFill4(m, r, c-1, newCol, oldCol)
.
.   maxr = len(m)
.   maxc = len(m[0])
.   newCol = 2
.   oldCol = foreground
.   for r in xrange(maxr):
.     for c in xrange(maxc):
.       if m[r][c] == oldCol:
.         floodFill4(m, r,c, newCol=newCol, oldCol=oldCol)
.         newCol += 1
.   # Frequences, this can be become a standard python command
.   freq = {}
.   for row in m:
.     for e in row:
.       if e in freq:
.         freq[e] += 1
.       else:
.         freq[e] = 1
.   del freq[background]
.   return freq.values()
.
. m = [[1,1,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0],
.      [1,1,1,0,1,1,0,0,0,0,0,1,1,1,1,0,0,0],
.      [1,1,1,0,1,1,1,0,1,1,0,1,1,1,0,0,0,0],
.      [0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0]]
. print connected(m)


There are much faster ways to do flood fills:
http://www.codeproject.com/gdi/QuickFill.asp
I think Python PIL doesn't support flood fills, and I think doing it
with Numarray is slower, and it's useful only to reduce the used
memory.
This connected program is slow and raw, and it uses recursivity a lot,
so it can crash the interpreter. You can use a data structure to
implement the stack yourself, to speed this program up. But for small
matrices I think this connected is enough ^_^

Bear hugs,
Bearophile




More information about the Python-list mailing list