[Numpy-discussion] loadtxt() behavior on single-line files

Benjamin Root ben.root at ou.edu
Thu Jun 24 14:53:25 EDT 2010


On Thu, Jun 24, 2010 at 1:00 PM, Warren Weckesser <
warren.weckesser at enthought.com> wrote:

> Benjamin Root wrote:
> > Hi,
> >
> > I was having the hardest time trying to figure out an intermittent bug
> > in one of my programs.  Essentially, in some situations, it was
> > throwing an error saying that the array object was not an array.  It
> > took me a while, but then I figured out that my program was assuming
> > that the object returned from a loadtxt() call was always a structured
> > array (I was using dtypes).  However, if the data file being loaded
> > only had one data record, then all you get back is a structured record.
> >
> > import numpy as np
> > from StringIO import StringIO
> >
> > strData = StringIO("89.23 47.2\n13.2 42.2")
> > a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
> > print "Length Two"
> > print a
> > print a.shape
> > print len(a)
> >
> > strData = StringIO("53.2 49.2")
> > a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
> > print "\n\nLength One"
> > print a
> > print a.shape
> > try :
> >     print len(a)
> > except TypeError as err
> >     print "ERROR:", err
> >
> > Which gets me this output:
> >
> > Length Two
> > [(89.230000000000004, 47.200000000000003)
> >  (13.199999999999999, 42.200000000000003)]
> > (2,)
> > 2
> >
> >
> > Length One
> > (53.200000000000003, 49.200000000000003)
> > ()
> > ERROR: len() of unsized object
> >
> >
> > Note that this isn't restricted to structured arrays.  For regular
> > ndarrays, loadtxt() appears to mimic the behavior of np.squeeze():
>
> Exactly.  The last four lines of the function are:
>
>    X = np.squeeze(X)
>    if unpack:
>        return X.T
>    else:
>        return X
>
> >
> > >>> a = np.ones((1, 1, 1))
> > >>> np.squeeze(a)[0]
> > IndexError: 0-d arrays can't be indexed
> >
> > >>> strData = StringIO("53.2")
> > >>> a = np.loadtxt(strData)
> > >>> a[0]
> > IndexError: 0-d arrays can't be indexed
> >
> > So, if you have multiple lines with multiple columns, you get a 2-D
> > array, as expected.
> > if you have a single line of data with multiple columns, you get a 1-D
> > array.
> > If you have a single column with many lines, you also get a 1-D array
> > (which is probably expected, I guess).
> > If you have a single column with a single line, you get a scalar
> > (actually, a 0-D array).
> >
> > Is this a bug or a feature?  I can see the advantages of having
> > loadtxt() returning the lowest # of dimensions that can hold the given
> > data, but it leaves the code vulnerable to certain edge cases.  Maybe
> > there is a different way I should be doing this, but I feel that this
> > behavior at the very least should be included in the loadtxt
> > documentation.
> >
>
> It would be useful to be able to tell loadtxt to not call squeeze, so a
> program that reads column-formatted data doesn't have to treat the case
> of a single line specially.
>
> Warren
>

I don't know if that is the best way to solve the problem.  In that case,
you would always get a 2-D array, right?  Is that useful for those who have
text data as a single column?  Maybe a mindim keyword (with None as default)
and apply an appropriate "atleast_Nd()" call (or maybe have available an
.atleast_nd() function?).  But, then what would this mean for structured
arrays?  One might think that they want at least 2-D, but they really want
at least 1-D.

Ben Root

P.S. - Taking this a step further, the functions completely fail in dealing
with empty files...  In MATLAB, it returns an empty array (matrix?).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100624/ee81e925/attachment.html>


More information about the NumPy-Discussion mailing list