[Numpy-discussion] loadtxt() behavior on single-line files

Tue Jul 27 12:58:56 EDT 2010

On Thu, Jun 24, 2010 at 1:53 PM, Benjamin Root <ben.root at ou.edu> wrote:

> On Thu, Jun 24, 2010 at 1:00 PM, Warren Weckesser <
> warren.weckesser at enthought.com> wrote:
>
>> Benjamin Root wrote:
>> > Hi,
>> >
>> > I was having the hardest time trying to figure out an intermittent bug
>> > in one of my programs.  Essentially, in some situations, it was
>> > throwing an error saying that the array object was not an array.  It
>> > took me a while, but then I figured out that my program was assuming
>> > that the object returned from a loadtxt() call was always a structured
>> > array (I was using dtypes).  However, if the data file being loaded
>> > only had one data record, then all you get back is a structured record.
>> >
>> > import numpy as np
>> > from StringIO import StringIO
>> >
>> > strData = StringIO("89.23 47.2\n13.2 42.2")
>> > a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
>> > print "Length Two"
>> > print a
>> > print a.shape
>> > print len(a)
>> >
>> > strData = StringIO("53.2 49.2")
>> > a = np.loadtxt(strData, dtype=[('x', float), ('y', float)])
>> > print "\n\nLength One"
>> > print a
>> > print a.shape
>> > try :
>> >     print len(a)
>> > except TypeError as err
>> >     print "ERROR:", err
>> >
>> > Which gets me this output:
>> >
>> > Length Two
>> > [(89.230000000000004, 47.200000000000003)
>> >  (13.199999999999999, 42.200000000000003)]
>> > (2,)
>> > 2
>> >
>> >
>> > Length One
>> > (53.200000000000003, 49.200000000000003)
>> > ()
>> > ERROR: len() of unsized object
>> >
>> >
>> > Note that this isn't restricted to structured arrays.  For regular
>> > ndarrays, loadtxt() appears to mimic the behavior of np.squeeze():
>>
>> Exactly.  The last four lines of the function are:
>>
>>    X = np.squeeze(X)
>>    if unpack:
>>        return X.T
>>    else:
>>        return X
>>
>> >
>> > >>> a = np.ones((1, 1, 1))
>> > >>> np.squeeze(a)[0]
>> > IndexError: 0-d arrays can't be indexed
>> >
>> > >>> strData = StringIO("53.2")
>> > >>> a = np.loadtxt(strData)
>> > >>> a[0]
>> > IndexError: 0-d arrays can't be indexed
>> >
>> > So, if you have multiple lines with multiple columns, you get a 2-D
>> > array, as expected.
>> > if you have a single line of data with multiple columns, you get a 1-D
>> > array.
>> > If you have a single column with many lines, you also get a 1-D array
>> > (which is probably expected, I guess).
>> > If you have a single column with a single line, you get a scalar
>> > (actually, a 0-D array).
>> >
>> > Is this a bug or a feature?  I can see the advantages of having
>> > loadtxt() returning the lowest # of dimensions that can hold the given
>> > data, but it leaves the code vulnerable to certain edge cases.  Maybe
>> > there is a different way I should be doing this, but I feel that this
>> > behavior at the very least should be included in the loadtxt
>> > documentation.
>> >
>>
>> It would be useful to be able to tell loadtxt to not call squeeze, so a
>> program that reads column-formatted data doesn't have to treat the case
>> of a single line specially.
>>
>> Warren
>>
>
> I don't know if that is the best way to solve the problem.  In that case,
> you would always get a 2-D array, right?  Is that useful for those who have
> text data as a single column?  Maybe a mindim keyword (with None as default)
> and apply an appropriate "atleast_Nd()" call (or maybe have available an
> .atleast_nd() function?).  But, then what would this mean for structured
> arrays?  One might think that they want at least 2-D, but they really want
> at least 1-D.
>
> Ben Root
>
> P.S. - Taking this a step further, the functions completely fail in dealing
> with empty files...  In MATLAB, it returns an empty array (matrix?).
>

I am reviving this "dead" thread to note that I have filed ticket #1562 on
the numpy Trac about this issue: http://projects.scipy.org/numpy/ticket/1562

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100727/55ae22cd/attachment.html>