[Numpy-discussion] loadtxt/savetxt tickets

Tue Apr 5 15:06:08 EDT 2011

On 04/04/2011 12:38 PM, Charles R Harris wrote:
>
>
> On Mon, Apr 4, 2011 at 11:01 AM, Bruce Southey <bsouthey at gmail.com 
> <mailto:bsouthey at gmail.com>> wrote:
>
>     On 04/04/2011 11:20 AM, Charles R Harris wrote:
>>
>>
>>     On Mon, Apr 4, 2011 at 9:59 AM, Bruce Southey <bsouthey at gmail.com
>>     <mailto:bsouthey at gmail.com>> wrote:
>>
>>         On 03/31/2011 12:02 PM, Derek Homeier wrote:
>>         > On 31 Mar 2011, at 17:03, Bruce Southey wrote:
>>         >
>>         >> This is an invalid ticket because the docstring clearly
>>         states that in
>>         >> 3 different, yet critical places, that missing values are
>>         not handled
>>         >> here:
>>         >>
>>         >> "Each row in the text file must have the same number of
>>         values."
>>         >> "genfromtxt : Load data with missing values handled as
>>         specified."
>>         >> "   This function aims to be a fast reader for simply
>>         formatted
>>         >> files.  The
>>         >>     `genfromtxt` function provides more sophisticated
>>         handling of,
>>         >> e.g.,
>>         >>     lines with missing values."
>>         >>
>>         >> Really I am trying to separate the usage of loadtxt and
>>         genfromtxt to
>>         >> avoid unnecessary duplication and confusion. Part of this is
>>         >> historical because loadtxt was added in 2007 and
>>         genfromtxt was added
>>         >> in 2009. So really certain features of loadtxt have been
>>          'kept' for
>>         >> backwards compatibility purposes yet these features can be
>>         'abused' to
>>         >> handle missing data. But I really consider that any
>>         missing values
>>         >> should cause loadtxt to fail.
>>         >>
>>         > OK, I was not aware of the design issues of loadtxt vs.
>>         genfromtxt -
>>         > you could probably say also for historical reasons since I
>>         have not
>>         > used genfromtxt much so far.
>>         > Anyway the docstring statement "Converters can also be used to
>>         >           provide a default value for missing data:"
>>         > then appears quite misleading, or an invitation to abuse,
>>         if you will.
>>         > This should better be removed from the documentation then,
>>         or users
>>         > explicitly discouraged from using converters instead of
>>         genfromtxt
>>         > (I don't see how you could completely prevent using
>>         converters in
>>         > this way).
>>         >
>>         >> The patch is incorrect because it should not include a
>>         space in the
>>         >> split() as indicated in the comment by the original
>>         reporter. Of
>>         > The split('\r\n') alone caused test_dtype_with_object(self)
>>         to fail,
>>         > probably
>>         > because it relies on stripping the blanks. But maybe the
>>         test is ill-
>>         > formed?
>>         >
>>         >> course a corrected patch alone still is not sufficient to
>>         address the
>>         >> problem without the user providing the correct converter.
>>         Also you
>>         >> start to run into problems with multiple delimiters (such
>>         as one space
>>         >> versus two spaces) so you start down the path to add all
>>         the features
>>         >> that duplicate genfromtxt.
>>         > Given that genfromtxt provides that functionality more
>>         conveniently,
>>         > I agree again users should be encouraged to use this instead of
>>         > converters.
>>         > But the actual tab-problem causes in fact an issue not
>>         related to
>>         > missing
>>         > values at all (well, depending on what you call a missing
>>         value).
>>         > I am describing an example on the ticket.
>>         >
>>         > Cheers,
>>         >                                       Derek
>>         >
>>         > _______________________________________________
>>         > NumPy-Discussion mailing list
>>         > NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>>         > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>         Okay I see that 1071 got closed which I am fine with.
>>
>>         I think that your following example should be a test because
>>         the two
>>         spaces should not be removed with a tab delimiter:
>>         np.loadtxt(StringIO("aa\tbb\n \t \ncc\t"), delimiter='\t',
>>         dtype=np.dtype([('label', 'S4'), ('comment', 'S4')]))
>>
>>
>>     Make a test and we'll put it in.
>>
>>     Chuck
>>
>>
>     I know!
>     Trying to write one made me realize that loadtxt is not handling
>     string arrays correctly. So I have to check more on this as I
>     think loadtxt is giving a 1-d array instead of a 2-d array.
>
>
> Tests often have that side effect.
>
> <snip>
>
> Chuck
>
>
Okay,
My confusion aside (sorry for that), I added ticket 1784 with a possible 
test that should work with ticket 1071:
http://projects.scipy.org/numpy/ticket/1794

Bruce


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110405/9cfc9630/attachment.html>