[SciPy-User] np.genfromtxt bug : breaks when # is present.
Dharhas Pothina
Dharhas.Pothina at twdb.state.tx.us
Wed Feb 2 16:26:12 EST 2011
Hi,
I think I found a bug in np.genfromtxt when reading in data that has missing values indicated by # symbols
>From what I can tell if a # is in any of the fields it is not seeing any of the data in the line to the right of the #.
to reproduce:
from StringIO import StringIO
data_str1 = '3.87 3.562 1.9 33.3 75.2 9.6\n13.87 3.562 1.9 ##.## 75.2 9.6\n'
data_str2 = '3.87 3.562 1.9 33.3 75.2 9.6\n13.87 3.562 1.9 #### 75.2 9.6\n'
data_str3 = '3.87 3.562 1.9 33.3 75.2 9.6\n13.87 3.562 1.9 NA 75.2 9.6\n'
data_str4 = '3.87 3.562 1.9 33.3 75.2 9.6\n13.87 3.562 1.9 N# 75.2 9.6\n'
np.genfromtxt(StringIO(data_str1), dtype=float, missing_values='##.##')
*** ValueError: Some errors were detected !
Line #2 (got 3 columns instead of 6
np.genfromtxt(StringIO(data_str2), dtype=float, missing_values='####')
*** ValueError: Some errors were detected !
Line #2 (got 3 columns instead of 6)
np.genfromtxt(StringIO(data_str3), dtype=float, missing_values='NA')
array([[ 3.87 , 3.562, 1.9 , 33.3 , 75.2 , 9.6 ],
[ 13.87 , 3.562, 1.9 , nan, 75.2 , 9.6 ]])
np.genfromtxt(StringIO(data_str4), dtype=float, missing_values='N#')
*** ValueError: Some errors were detected !
Line #2 (got 4 columns instead of 6)
I have a workaround replacing all #'s with N's before reading the data with genfromtxt.
- dharhas
More information about the SciPy-User
mailing list