[SciPy-User] np.genfromtxt bug : breaks when # is present.

Dharhas Pothina Dharhas.Pothina at twdb.state.tx.us
Wed Feb 2 17:56:49 EST 2011


Hi,
 
I realized that after sending my email. 
 
I've never seen environmental monitoring data from equipment or from various entities have midline comments. i.e any comments always start  at the beginning of the line with the comment character in the first position.
 
Is there any value in having an optional behavior in genfromtxt that only ignores lines starting with the comment character or is that too specific of a use case.
 
- dharhas

>>> Corran Webster <cwebster at enthought.com> 2/2/2011 3:33 PM >>>
Hi,

'#' is the default comment marker in genfromtxt, so it will ignore anything on line after a '#' as you observed.

You can probably work around this by specifying a different comment character in the arguments:
np.genfromtxt( ... comments='%')
or some other character that won't appear in your input.

-- Corran

On Wed, Feb 2, 2011 at 3:26 PM, Dharhas Pothina <Dharhas.Pothina at twdb.state.tx.us> wrote:


Hi,


I think I found a bug in np.genfromtxt when reading in data that has missing values indicated by # symbols
>From what I can tell if a # is in any of the fields it is not seeing any of the data in the line to the right of the #.


to reproduce:


from StringIO import StringIO


data_str1 = '3.87 3.562 1.9 33.3 75.2 9.6\n13.87 3.562 1.9 ##.## 75.2 9.6\n'
data_str2 = '3.87 3.562 1.9 33.3 75.2 9.6\n13.87 3.562 1.9 #### 75.2 9.6\n'
data_str3 = '3.87 3.562 1.9 33.3 75.2 9.6\n13.87 3.562 1.9 NA 75.2 9.6\n'
data_str4 = '3.87 3.562 1.9 33.3 75.2 9.6\n13.87 3.562 1.9 N# 75.2 9.6\n'


np.genfromtxt(StringIO(data_str1), dtype=float, missing_values='##.##')
*** ValueError: Some errors were detected !
Line #2 (got 3 columns instead of 6



np.genfromtxt(StringIO(data_str2), dtype=float, missing_values='####')
*** ValueError: Some errors were detected !
Line #2 (got 3 columns instead of 6)


np.genfromtxt(StringIO(data_str3), dtype=float, missing_values='NA')
array([[ 3.87 , 3.562, 1.9 , 33.3 , 75.2 , 9.6 ],
[ 13.87 , 3.562, 1.9 , nan, 75.2 , 9.6 ]])




np.genfromtxt(StringIO(data_str4), dtype=float, missing_values='N#')
*** ValueError: Some errors were detected !
Line #2 (got 4 columns instead of 6)


I have a workaround replacing all #'s with N's before reading the data with genfromtxt.


- dharhas


_______________________________________________
SciPy-User mailing list
SciPy-User at scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20110202/2cf36281/attachment.html>


More information about the SciPy-User mailing list