[SciPy-User] equivalent of tolist().index(entry) for numpy 1d array of strings

Ryan Krauss ryanlists at gmail.com
Tue Dec 22 09:16:04 EST 2009


> If you are using ipython then it is handly, and more accurate, to use
> timeit. At the ipython prompt try:

> timeit where(self.md5sum==photo.md5sum)[0][0]

Thanks for the tip.  I have all but given up on using timeit in
scripts because I can't find my way around namespace issues.  But
ipython (not surprisingly) handles the namespace problems nicely.

Thanks again,

Ryan

On Mon, Dec 21, 2009 at 8:27 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
> On Mon, Dec 21, 2009 at 6:09 PM, Ryan Krauss <ryanlists at gmail.com> wrote:
>> I am still open to more elegant solutions, but it seems like my
>> concerns about .tolist() being inefficient are unfounded (this may be
>> an indicator that I don't understand the inner workings of numpy very
>> well).
>>
>> Here is my test:
>>
>> t1 = time.time()
>> index1 = where(self.md5sum==photo.md5sum)[0][0]
>> t2 = time.time()
>> index2 = mysearch(self.md5sum, photo.md5sum)
>> t3 = time.time()
>> index3 = self.md5sum.tolist().index(photo.md5sum)
>> t4 = time.time()
>
> If you are using ipython then it is handly, and more accurate, to use
> timeit. At the ipython prompt try:
>
> timeit where(self.md5sum==photo.md5sum)[0][0]
>
>>
>> All 3 approaches lead to the same result.  Here are my timing results:
>> t2-t1=4.81605529785e-05
>> t3-t2=4.98294830322e-05
>> t4-t3=2.00271606445e-05
>>
>> def mysearch(arrayin, element):
>>    bool_vect = where(arrayin==element)[0]
>>    assert(len(bool_vect)==1), 'Did not find exactly 1 match for ' +
>> str(element)
>>    return bool_vect[0]
>
> If element is not in arrayin then mysearch will crash. Same for .index.
>
>>
>> Now, for this test, the arrays didn't have very many elements (10 ish).
>>
>> FWIW,
>>
>> Ryan
>>
>> On Mon, Dec 21, 2009 at 7:53 PM, Ryan Krauss <ryanlists at gmail.com> wrote:
>>> I wrote some code to work with csv spreadsheet files by reading the
>>> columns into lists, but I need to rework the code to work with numpy
>>> 1d arrays of strings rather than lists.  I need to search one of these
>>> columns/arrays.  What is the best way to find the index for the
>>> element that matches a certain string (or maybe just the first element
>>> to match such a string)?
>>>
>>> With the columns as lists, I was doing
>>> index = mylist.index(entry)
>>>
>>> So, I could obviously do
>>> index = mylist.tolist().index(entry)
>>>
>>> but I don't know if that would be slower or clumsier than something like
>>> bool_vect = where(mylist==entry)[0]
>>> index = bool_vect[0]
>>>
>>> or just
>>>
>>> index = where(mylist==entry)[0][0]
>>>
>>> Any thoughts?  Is there an easier way?
>>>
>>> Thanks,
>>>
>>> Ryan
>>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



More information about the SciPy-User mailing list