[Numpy-discussion] loadtxt/savetxt tickets

Thu Mar 31 14:39:45 EDT 2011

On 03/31/2011 12:02 PM, Derek Homeier wrote:
> On 31 Mar 2011, at 17:03, Bruce Southey wrote:
>
>> This is an invalid ticket because the docstring clearly states that in
>> 3 different, yet critical places, that missing values are not handled
>> here:
>>
>> "Each row in the text file must have the same number of values."
>> "genfromtxt : Load data with missing values handled as specified."
>> "   This function aims to be a fast reader for simply formatted
>> files.  The
>>     `genfromtxt` function provides more sophisticated handling of,
>> e.g.,
>>     lines with missing values."
>>
>> Really I am trying to separate the usage of loadtxt and genfromtxt to
>> avoid unnecessary duplication and confusion. Part of this is
>> historical because loadtxt was added in 2007 and genfromtxt was added
>> in 2009. So really certain features of loadtxt have been  'kept' for
>> backwards compatibility purposes yet these features can be 'abused' to
>> handle missing data. But I really consider that any missing values
>> should cause loadtxt to fail.
>>
> OK, I was not aware of the design issues of loadtxt vs. genfromtxt -
> you could probably say also for historical reasons since I have not
> used genfromtxt much so far.
> Anyway the docstring statement "Converters can also be used to
>           provide a default value for missing data:"
> then appears quite misleading, or an invitation to abuse, if you will.
> This should better be removed from the documentation then, or users
> explicitly discouraged from using converters instead of genfromtxt
> (I don't see how you could completely prevent using converters in
> this way).
>
>> The patch is incorrect because it should not include a space in the
>> split() as indicated in the comment by the original reporter. Of
> The split('\r\n') alone caused test_dtype_with_object(self) to fail,
> probably
> because it relies on stripping the blanks. But maybe the test is ill-
> formed?
>
>> course a corrected patch alone still is not sufficient to address the
>> problem without the user providing the correct converter. Also you
>> start to run into problems with multiple delimiters (such as one space
>> versus two spaces) so you start down the path to add all the features
>> that duplicate genfromtxt.
> Given that genfromtxt provides that functionality more conveniently,
> I agree again users should be encouraged to use this instead of
> converters.
> But the actual tab-problem causes in fact an issue not related to
> missing
> values at all (well, depending on what you call a missing value).
> I am describing an example on the ticket.
>
> Cheers,
> 					Derek
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
I am really not disagreeing that much with you. Rather that, as you have 
shown, it is very easy to increase the complexity of examples that 
loadtxt does not handle.  By missing value I mean "when no data value is 
stored for the variable in the current observation" (via Wikipedia) 
since encoded missing values (such as '.', 'NA' and 'NaN') can be 
recovered.


Bruce