[Tutor] Question about style

Wolfgang Maier wolfgang.maier at biologie.uni-freiburg.de
Wed Jul 16 23:14:50 CEST 2014


On 16.07.2014 13:49, Jose Amoreira wrote:
> Hello
> I wrote a function that, given a list of numbers, finds clusters of
> values by proximity and returns a reduced list containing the centers of
> these clusters. However, I find it rather unclear. I would appreciate
> any comments on how pythonic my function is and suggestions to improve
> its readability.
> The function is:
>
> def aglomerate(x_lst, delta=1.e-5):
>      clusters = [] #list of pairs [center, number of clustered values]
>      for x in x_lst:
>          close_to = [abs(x - y) < delta for y,_ in clusters]
>          if any(close_to):
>              # x is close to a cluster
>              index = close_to.index(True)
>              center, n = clusters[index]
>              #update the cluster center including the new value,
>              #and increment dimension of cluster
>              clusters[index] = (n * center + x)/(n+1), n+1

careful here: you just stored a tuple instead of a list; doesn't matter 
for your current implementation, but may bite you at some point.

>          else:
>              # x does not belong to any cluster, create a new one
>              clusters.append([x,1])
>      # return list with centers
>      return [center for center, _ in clusters]
>

Your building of the close_to list with Trues and Falses, then using it 
to find the original element again makes me think that you're a regular 
R user ? Using such a logical vector (I guess that's what you'd call it 
in R) should (almost) never be required in Python.
Possible solutions for your case depend somewhat on what you want to be 
able to do. Currently, you are identifying all possible clusters that a 
new value may belong to, but then you are simply adding it to the first 
cluster. If that's all your code needs to do, then you could do (this 
avoids indexing altogether by exploiting the fact that lists are mutable):

def aglomerate(x_lst, delta=1.e-5):
     clusters = [] #list of pairs [center, number of clustered values]
     for x in x_lst:
         for cluster in clusters:
	    # cluster is now a list of two elements
	    # lists are mutable objects
             center, n = cluster # to keep things readable
             if abs(x - center) < delta:
                 # x is close to a cluster
                 #update the cluster center including the new value,
                 #and increment dimension of cluster
                 # since list are mutable objects you can do this as an
                 # in-place operation
                 cluster[0] = (n * center + x)/(n+1)
                 cluster[1] = n+1
                 break
         else:
             # this block is executed if the break in the preceeding
             # block wasn't reached =>
             # x does not belong to any cluster, create a new one
             clusters.append([x,1])
     # return list with centers
     return [center for center, _ in clusters]

Given your return value on the other hand, it looks like your clusters 
data structure is somewhat suboptimal. In fact, you would be better off 
with two separate lists for centers and sizes, in which case you could 
simply
return centers.

Here's an implementation of this idea demonstrating how, in Python, you 
typically use the builtin enumerate function to avoid "logical vectors" 
or other kinds of index juggling:

def aglomerate(x_lst, delta=1.e-5):
     centers = []
     sizes = []
     for x in x_lst:
         for i, center in enumerate(centers):
             if abs(x - center) < delta:
                 # x is close to a cluster
                 #update the cluster center including the new value,
                 #and increment dimension of cluster
                 n = sizes[i]
                 centers[i] = (n * center + x)/(n+1)
                 sizes[i] = n+1
                 break
         else:
             # this block is executed only when the break in the preceeding
             # block wasn't reached =>
             # x does not belong to any cluster, create a new one
             centers.append(x)
             sizes.append(1)
     # return list with centers
     return centers

In summary, indices in Python should be handled using enumerate and 
sometimes can be avoided completely by in-place manipulations of mutable 
objects.

Best,
Wolfgang


More information about the Tutor mailing list