[Numpy-discussion] loadtxt/savetxt tickets
Bruce Southey
bsouthey at gmail.com
Tue Apr 5 15:06:08 EDT 2011
On 04/04/2011 12:38 PM, Charles R Harris wrote:
>
>
> On Mon, Apr 4, 2011 at 11:01 AM, Bruce Southey <bsouthey at gmail.com
> <mailto:bsouthey at gmail.com>> wrote:
>
> On 04/04/2011 11:20 AM, Charles R Harris wrote:
>>
>>
>> On Mon, Apr 4, 2011 at 9:59 AM, Bruce Southey <bsouthey at gmail.com
>> <mailto:bsouthey at gmail.com>> wrote:
>>
>> On 03/31/2011 12:02 PM, Derek Homeier wrote:
>> > On 31 Mar 2011, at 17:03, Bruce Southey wrote:
>> >
>> >> This is an invalid ticket because the docstring clearly
>> states that in
>> >> 3 different, yet critical places, that missing values are
>> not handled
>> >> here:
>> >>
>> >> "Each row in the text file must have the same number of
>> values."
>> >> "genfromtxt : Load data with missing values handled as
>> specified."
>> >> " This function aims to be a fast reader for simply
>> formatted
>> >> files. The
>> >> `genfromtxt` function provides more sophisticated
>> handling of,
>> >> e.g.,
>> >> lines with missing values."
>> >>
>> >> Really I am trying to separate the usage of loadtxt and
>> genfromtxt to
>> >> avoid unnecessary duplication and confusion. Part of this is
>> >> historical because loadtxt was added in 2007 and
>> genfromtxt was added
>> >> in 2009. So really certain features of loadtxt have been
>> 'kept' for
>> >> backwards compatibility purposes yet these features can be
>> 'abused' to
>> >> handle missing data. But I really consider that any
>> missing values
>> >> should cause loadtxt to fail.
>> >>
>> > OK, I was not aware of the design issues of loadtxt vs.
>> genfromtxt -
>> > you could probably say also for historical reasons since I
>> have not
>> > used genfromtxt much so far.
>> > Anyway the docstring statement "Converters can also be used to
>> > provide a default value for missing data:"
>> > then appears quite misleading, or an invitation to abuse,
>> if you will.
>> > This should better be removed from the documentation then,
>> or users
>> > explicitly discouraged from using converters instead of
>> genfromtxt
>> > (I don't see how you could completely prevent using
>> converters in
>> > this way).
>> >
>> >> The patch is incorrect because it should not include a
>> space in the
>> >> split() as indicated in the comment by the original
>> reporter. Of
>> > The split('\r\n') alone caused test_dtype_with_object(self)
>> to fail,
>> > probably
>> > because it relies on stripping the blanks. But maybe the
>> test is ill-
>> > formed?
>> >
>> >> course a corrected patch alone still is not sufficient to
>> address the
>> >> problem without the user providing the correct converter.
>> Also you
>> >> start to run into problems with multiple delimiters (such
>> as one space
>> >> versus two spaces) so you start down the path to add all
>> the features
>> >> that duplicate genfromtxt.
>> > Given that genfromtxt provides that functionality more
>> conveniently,
>> > I agree again users should be encouraged to use this instead of
>> > converters.
>> > But the actual tab-problem causes in fact an issue not
>> related to
>> > missing
>> > values at all (well, depending on what you call a missing
>> value).
>> > I am describing an example on the ticket.
>> >
>> > Cheers,
>> > Derek
>> >
>> > _______________________________________________
>> > NumPy-Discussion mailing list
>> > NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> Okay I see that 1071 got closed which I am fine with.
>>
>> I think that your following example should be a test because
>> the two
>> spaces should not be removed with a tab delimiter:
>> np.loadtxt(StringIO("aa\tbb\n \t \ncc\t"), delimiter='\t',
>> dtype=np.dtype([('label', 'S4'), ('comment', 'S4')]))
>>
>>
>> Make a test and we'll put it in.
>>
>> Chuck
>>
>>
> I know!
> Trying to write one made me realize that loadtxt is not handling
> string arrays correctly. So I have to check more on this as I
> think loadtxt is giving a 1-d array instead of a 2-d array.
>
>
> Tests often have that side effect.
>
> <snip>
>
> Chuck
>
>
Okay,
My confusion aside (sorry for that), I added ticket 1784 with a possible
test that should work with ticket 1071:
http://projects.scipy.org/numpy/ticket/1794
Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110405/9cfc9630/attachment.html>
More information about the NumPy-Discussion
mailing list