[SciPy-User] Identify unique sequence data from array

josef.pktd at gmail.com josef.pktd at gmail.com
Wed Dec 22 15:27:36 EST 2010


On Wed, Dec 22, 2010 at 3:18 PM, otrov <dejan.org at gmail.com> wrote:
>>> The problem:
>
>>> I have 2D data sets (scipy/numpy arrays) of 10^7 to 10^8 rows, which consists of repeated sequences of one unique sequence, usually ~10^5 rows, but may differ in scale. Period is same for both columns, so there is not really difference if we consider 2D or 1D array.
>>> I want to track this data block.
>
>> for i in range(1, len(X)-1):
>>     if (X[i:] == X[:-i]).all():
>>         break

I don't see how this works, isn't it

(X[:i] == X[-i:]).all():

with an integer repeat, there should also be a restriction that n/i is
an int, otherwise the repeat is not possible.

if n//i != n/float(i): continue

or mod == 0
?

Josef

>
> Just look at that python beauty! Such a great language when in hand of a smart user.
> Thanks for you snippet, but unfortunately it takes forever to finish the task
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



More information about the SciPy-User mailing list