[SciPy-user] sorting timeseries data.

Fri May 2 12:34:11 EDT 2008

On Friday 02 May 2008 12:11:04 Dharhas Pothina wrote:
> I want to sort the data to be monotonically increasing by the variable
> seconds and filter out duplicate values (say by deleting the second
> occurrence).

Dharhas,
>>>idx = seconds.argsort()
>>>sorted_seconds = seconds[idx]
>>>sorted_data = data[idx]
 will do the trick. Look at the help for the argsort method if you need to use 
a specific sorting algorithm. 'mergesort' is stable and can be preferred.

Then, you can try to find the duplicates that way:
>>>diffs = numpy.ediff1d(sorted_seconds, to begin=1)
>>>unq = (diffs!=0)
>>>final_seconds = sorted_seconds.compress(unq)
>>>final_data = sorted_data.compress(unq)

In a side note, you may want to give scikits.timeseries a try: we develop this 
package specifically to handle time series (ie, series indexed in time). The 
sorting part would be automatic, and finding the duplicates is also quite 
easy.
HIH