[Numpy-discussion] For loop tips

Tim Hochberg tim.hochberg at ieee.org
Tue Aug 29 14:40:11 EDT 2006


Keith Goodman wrote:
> I have a very long list that contains many repeated elements. The
> elements of the list can be either all numbers, or all strings, or all
> dates [datetime.date].
>
> I want to convert the list into a matrix where each unique element of
> the list is assigned a consecutive integer starting from zero.
>   
If what you want is that the first unique element get's zero, the second 
one, I don't think the code below will work in general since the dict 
does not preserve order. You might want to look at the results for the 
character case to see what I mean. If you're looking for something else, 
you'll need to elaborate a bit. Since list2index doesn't return 
anything, it's not entirely clear what the answer consists of. Just idx? 
Idx plus uL?

> I've done it by brute force below. Any tips for making it faster? (5x
> would make it useful; 10x would be a dream.)
>   
Assuming I understand what you're trying to do, this might help:

    def list2index2(L):
        idx = ones([len(L)])
        map = {}
        for i, x in enumerate(L):
            index = map.get(x)
            if index is None:
                map[x] = index = len(map)
            idx[i] = index
        return idx


It's almost 10x faster for numbers and about 40x faster for characters 
and dates. However it produces different results from list2index in the 
second two cases. That may or may not be a good thing depending on what 
you're really trying to do.

-tim

>   
>>> list2index.test()
>>>       
> Numbers: 5.84955787659 seconds
> Characters: 24.3192870617 seconds
> Dates: 39.288228035 seconds
>
>
> import datetime, time
> from numpy import nan, asmatrix, ones
>
> def list2index(L):
>
>   # Find unique elements in list
>   uL = dict.fromkeys(L).keys()
>
>   # Convert list to matrix
>   L = asmatrix(L).T
>
>   # Initialize return matrix
>   idx = nan * ones((L.size, 1))
>
>   # Assign numbers to unique L values
>   for i, uLi in enumerate(uL):
>     idx[L == uLi,:] = i
>
> def test():
>
>     L = 5000*range(255)
>     t1 = time.time()
>     idx = list2index(L)
>     t2 = time.time()
>     print 'Numbers:', t2-t1, 'seconds'
>
>     L = 5000*[chr(z) for z in range(255)]
>     t1 = time.time()
>     idx = list2index(L)
>     t2 = time.time()
>     print 'Characters:', t2-t1, 'seconds'
>
>     d = datetime.date
>     step = datetime.timedelta
>     L = 5000*[d(2006,1,1)+step(z) for z in range(255)]
>     t1 = time.time()
>     idx = list2index(L)
>     t2 = time.time()
>     print 'Dates:', t2-t1, 'seconds'
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>
>
>   






More information about the NumPy-Discussion mailing list