[SciPy-user] sorting timeseries data.

Pierre GM pgmdevlist at gmail.com
Fri May 2 12:34:11 EDT 2008


On Friday 02 May 2008 12:11:04 Dharhas Pothina wrote:
> I want to sort the data to be monotonically increasing by the variable
> seconds and filter out duplicate values (say by deleting the second
> occurrence).

Dharhas,
>>>idx = seconds.argsort()
>>>sorted_seconds = seconds[idx]
>>>sorted_data = data[idx]
 will do the trick. Look at the help for the argsort method if you need to use 
a specific sorting algorithm. 'mergesort' is stable and can be preferred.

Then, you can try to find the duplicates that way:
>>>diffs = numpy.ediff1d(sorted_seconds, to begin=1)
>>>unq = (diffs!=0)
>>>final_seconds = sorted_seconds.compress(unq)
>>>final_data = sorted_data.compress(unq)

In a side note, you may want to give scikits.timeseries a try: we develop this 
package specifically to handle time series (ie, series indexed in time). The 
sorting part would be automatic, and finding the duplicates is also quite 
easy.
HIH



More information about the SciPy-User mailing list