[SciPy-User] np.genfromtxt bug : breaks when # is present.

Dharhas Pothina Dharhas.Pothina at twdb.state.tx.us
Wed Feb 2 16:26:12 EST 2011


Hi,


I think I found a bug in np.genfromtxt when reading in data that has missing values indicated by # symbols 
>From what I can tell if a # is in any of the fields it is not seeing any of the data in the line to the right of the #.


to reproduce:


from StringIO import StringIO


data_str1 = '3.87   3.562     1.9   33.3    75.2    9.6\n13.87   3.562     1.9   ##.##    75.2    9.6\n'
data_str2 = '3.87   3.562     1.9   33.3    75.2    9.6\n13.87   3.562     1.9   ####    75.2    9.6\n'
data_str3 = '3.87   3.562     1.9   33.3    75.2    9.6\n13.87   3.562     1.9   NA    75.2    9.6\n'
data_str4 = '3.87   3.562     1.9   33.3    75.2    9.6\n13.87   3.562     1.9   N#    75.2    9.6\n'


np.genfromtxt(StringIO(data_str1), dtype=float, missing_values='##.##')
*** ValueError: Some errors were detected !
    Line #2 (got 3 columns instead of 6



np.genfromtxt(StringIO(data_str2), dtype=float, missing_values='####')
*** ValueError: Some errors were detected !
    Line #2 (got 3 columns instead of 6)


np.genfromtxt(StringIO(data_str3), dtype=float, missing_values='NA')
array([[  3.87 ,   3.562,   1.9  ,  33.3  ,  75.2  ,   9.6  ],
       [ 13.87 ,   3.562,   1.9  ,     nan,  75.2  ,   9.6  ]])




np.genfromtxt(StringIO(data_str4), dtype=float, missing_values='N#')
*** ValueError: Some errors were detected !
    Line #2 (got 4 columns instead of 6)


I have a workaround replacing all #'s with N's before reading the data with genfromtxt. 


- dharhas





More information about the SciPy-User mailing list