[SciPy-user] scipy.io.read_array: NaN in data file

Dharhas Pothina Dharhas.Pothina at twdb.state.tx.us
Wed Mar 11 09:13:26 EDT 2009


In this particular case we know the cause:

It is either :

a) Overlapping files have been appended. ie file1 contains data from Jan1 to Feb1 and file2 contains data from jan1 to March1. The overlap region has identical data.

b) The data comes from sequential deployments and there is an small overlap at the beginning of the second file. ie file1 has data from Jan1 to Feb1 and file2 contains data from Feb1 to March1. There may be a few data points overlap. These are junk because the equipment was set up in the lab and took measurements in the air until it was swapped with the installed instrument in the water. 

In both these cases it is appropriate to take the first value. In the second case we really should be stripping the bad data before appending but this is a work in progress. Right now we are developing a semi-automated QA/QC procedure to clean up data before posting it on the web. We presently use a mix of awk and shell scripts but I'm trying to convert everything to python to make it easier to use, more maintainable, have nicer plots than gnuplot and to develop a gui application to help us do this.

- dharhas

>>> Timmie <timmichelsen at gmx-topmail.de> 3/11/2009 4:35 AM >>>
> Well, because there's no standard way to do that: when you have  
> duplicated dates, should you take the first  one? The last one ? Take  
> some kind of average of the values ?
Sometimes, there are inherent faults in the data set. Therefore, a automatic
treatment may introduce further errors.
It's only possible when this errors are occuring somewhat systematically.




_______________________________________________
SciPy-user mailing list
SciPy-user at scipy.org 
http://mail.scipy.org/mailman/listinfo/scipy-user




More information about the SciPy-User mailing list