[SciPy-User] np.genfromtxt bug : breaks when # is present.

Wed Feb 2 16:26:12 EST 2011

Hi,

I think I found a bug in np.genfromtxt when reading in data that has missing values indicated by # symbols 
>From what I can tell if a # is in any of the fields it is not seeing any of the data in the line to the right of the #.

to reproduce:

from StringIO import StringIO

data_str1 = '3.87   3.562     1.9   33.3    75.2    9.6\n13.87   3.562     1.9   ##.##    75.2    9.6\n'
data_str2 = '3.87   3.562     1.9   33.3    75.2    9.6\n13.87   3.562     1.9   ####    75.2    9.6\n'
data_str3 = '3.87   3.562     1.9   33.3    75.2    9.6\n13.87   3.562     1.9   NA    75.2    9.6\n'
data_str4 = '3.87   3.562     1.9   33.3    75.2    9.6\n13.87   3.562     1.9   N#    75.2    9.6\n'

np.genfromtxt(StringIO(data_str1), dtype=float, missing_values='##.##')
*** ValueError: Some errors were detected !
    Line #2 (got 3 columns instead of 6

np.genfromtxt(StringIO(data_str2), dtype=float, missing_values='####')
*** ValueError: Some errors were detected !
    Line #2 (got 3 columns instead of 6)

np.genfromtxt(StringIO(data_str3), dtype=float, missing_values='NA')
array([[  3.87 ,   3.562,   1.9  ,  33.3  ,  75.2  ,   9.6  ],
       [ 13.87 ,   3.562,   1.9  ,     nan,  75.2  ,   9.6  ]])

np.genfromtxt(StringIO(data_str4), dtype=float, missing_values='N#')
*** ValueError: Some errors were detected !
    Line #2 (got 4 columns instead of 6)

I have a workaround replacing all #'s with N's before reading the data with genfromtxt. 

- dharhas