[Numpy-discussion] loadtxt ndmin option

Thu May 5 11:49:19 EDT 2011

On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes <
paul.anton.letnes at gmail.com> wrote:

>
> On 4. mai 2011, at 20.33, Benjamin Root wrote:
>
> > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier <
> derek at astro.physik.uni-goettingen.de> wrote:
> > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
> >
> > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written
> for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it
> will reintroduce the 'transposed' problem?
> >
> > Yes, good point, one could replace the
> > X.shape = (X.size, ) with X = np.atleast_1d(X),
> > but for the ndmin=2 case, we'd need to replace
> > X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
> > not sure which solution is more efficient in terms of memory access
> etc...
> >
> > Cheers,
> >                                                Derek
> >
> >
> > I can confirm that the current behavior is not sufficient for all of the
> original corner cases that ndmin was supposed to address.  Keep in mind that
> np.loadtxt takes a one-column data file and a one-row data file down to the
> same shape.  I don't see how the current code is able to produce the correct
> array shape when ndmin=2.  Do we have some sort of counter in loadtxt for
> counting the number of rows and columns read?  Could we use those to help
> guide the ndmin=2 case?
> >
> > I think that using atleast_1d(X) might be a bit overkill, but it would be
> very clear as to the code's intent.  I don't think we have to worry about
> memory usage if we limit its use to only situations where ndmin is greater
> than the number of dimensions of the array.  In those cases, the array is
> either an empty result, a scalar value (in which memory access is trivial),
> or 1-d (in which a transpose is cheap).
>
> What if one does things the other way around - avoid calling squeeze until
> _after_ doing the atleast_Nd() magic? That way the row/column information
> should be conserved, right? Also, we avoid transposing, memory use, ...
>
> Oh, and someone could conceivably have a _looong_ 1D file, but would want
> it read as a 2D array.
>
> Paul
>
>
>
@Derek, good catch with noticing the error in the tests. We do still need to
handle the case I mentioned, however.  I have attached an example script to
demonstrate the issue.  In this script, I would expect the second-to-last
array to be a shape of (1, 5).  I believe that the single-row, multi-column
case would actually be the more common type of edge-case encountered by
users than the others.  Therefore, I believe that this ndmin fix is not
adequate until this is addressed.

@Paul, we can't call squeeze after doing the atleast_Nd() magic.  That would
just undo whatever we had just done.  Also, wrt the transpose, a (1, 100000)
array looks the same in memory as a (100000, 1) array, right?

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110505/da28783d/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: loadtest.py
Type: application/octet-stream
Size: 734 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110505/da28783d/attachment.obj>