[SciPy-user] sorting timeseries data.
Pierre GM
pgmdevlist at gmail.com
Fri May 2 12:34:11 EDT 2008
On Friday 02 May 2008 12:11:04 Dharhas Pothina wrote:
> I want to sort the data to be monotonically increasing by the variable
> seconds and filter out duplicate values (say by deleting the second
> occurrence).
Dharhas,
>>>idx = seconds.argsort()
>>>sorted_seconds = seconds[idx]
>>>sorted_data = data[idx]
will do the trick. Look at the help for the argsort method if you need to use
a specific sorting algorithm. 'mergesort' is stable and can be preferred.
Then, you can try to find the duplicates that way:
>>>diffs = numpy.ediff1d(sorted_seconds, to begin=1)
>>>unq = (diffs!=0)
>>>final_seconds = sorted_seconds.compress(unq)
>>>final_data = sorted_data.compress(unq)
In a side note, you may want to give scikits.timeseries a try: we develop this
package specifically to handle time series (ie, series indexed in time). The
sorting part would be automatic, and finding the duplicates is also quite
easy.
HIH
More information about the SciPy-User
mailing list