[SciPy-User] Suggestion for numpy.genfromtxt documentation

Skipper Seabold jsseabold at gmail.com
Fri Oct 9 10:21:12 EDT 2009


On Wed, Oct 7, 2009 at 3:20 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> On 10/07/2009 10:52 AM, Skipper Seabold wrote:
>> On Wed, Oct 7, 2009 at 11:25 AM, Dharhas Pothina
>> <Dharhas.Pothina at twdb.state.tx.us>  wrote:
>>
>>> Hi,
>>>
>>> It took me a while and a lot of trial and error to work out why this didn't work as expected.
>>>
>>> data = np.genfromtxt(fname,usecols=(2,3,4),names='x,y,z')
>>>
>>> this command works and does not return any warnings or errors, but returns an numpy array with no field names. If you use:
>>>
>>> data = np.genfromtxt(fname,usecols=(2,3,4),dtype=None,names='x,y,z')
>>>
>>> then the command does what I expect it to and returns a structured numpy array with field names. So essentially, the 'names' argument doesn't not work unless you also specify the 'dtype' argument.
>>>
> What did you actually expect?
> It would be very informative if you could provide a simple example of
> this for testing.
>
> There are many combinations of arguments so not all have been tested and
> it is not always clear what the expected behavior should be.
>
>>> I think, it would be less confusing to new users to either have this explicitly mentioned in the documentation string for the genfromtxt 'names' argument or to have the function default to 'dtype=None'  if the 'names' argument is specified without specifying the 'dtype' argument.
>>>
>>> - dharhas
>>>
>> I came across this behavior recently and agree with you.  There is a
>> patch in the works for this.
>>
>> See this thread: http://thread.gmane.org/gmane.comp.python.numeric.general/33479
>>
>> And this ticket: http://projects.scipy.org/numpy/ticket/1252
>>
>> Cheers,
>>
>> Skipper
>>
>
>  From the numpy help, there is this example:
> data = np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
> ('mystring','S5')], delimiter=",")
>
> It does not help that the dtype of structured arrays also includes the
> actual name. So I do not think we can use dtype argument without using
> the combination of dtype and name. Perhaps if dtype is split into names
> and formats so that dtype=('name', 'format').
>
> In some sense you are suggesting that we should have something like:
>

With the defaultfmt keyword added and the new changes here is the
current state of things.

from StringIO import StringIO
import numpy as np

s = StringIO("1,2,3.0")

> Ignore the use of None and True for dtype and names arguments:
> i) If only dtype is only specified then use the specified dtype and add
> default names such as col1, col2,... if necessary
>

This gives a plain array, so no default names are used.

data = np.genfromtxt(s, delimiter=",") # dtype=float

In [54]: data
Out[54]: array([ 1.,  2.,  3.])

If default names are specified then it doesn't seem to pick them up as
of right now.

s.seek(0)
data = np.genfromtxt(s, delimiter=",", defaultfmt="Var%i")

In [79]: data
Out[79]: array([ 1.,  2.,  3.])


> ii) If names is only specified then contruct the dtype as ('name',
> 'default format')

s.seek(0)
data = np.genfromtxt(s, delimiter=",", names=['var1','var2','var3'])
#dtype = float

In [57]: data
Out[57]:
array((1.0, 2.0, 3.0),
      dtype=[('var1', '<f8'), ('var2', '<f8'), ('var3', '<f8')])

> iii) If formats is only specified then construct the dtype as ('default
> name', 'format')

This doesn't seem to work with the new easy dtype as noted above.

But this does

data = np.genfromtxt(s, delimiter=",", dtype=(int,int,float),
defaultfmt="var%i")

In [72]: data
Out[72]:
array((1, 2, 3.0),
      dtype=[('var0', '<i8'), ('var1', '<i8'), ('var2', '<f8')])

> iv) If only names and formats are only specified then construct the
> dtype as ('name', 'format')
>

So I think this means,

s.seek(0)
data = np.genfromtxt(s, delimiter=",", dtype=(int,int,float),
names=['var1','var2','var3'])

In [86]: data
Out[86]:
array((1, 2, 3.0),
      dtype=[('var1', '<i8'), ('var2', '<i8'), ('var3', '<f8')])


> v) If no dtype, names and formats are only specified then construct the
> dtype as ('default name', 'default format')
>

Same case as above I think where

s.seek(0)
data = np.genfromtxt(s, delimiter=",", defaultfmt="var%i")

doesn't work as "expected" to zip float (the default format) with the
default name, specified by defaultfmt.

> vi) If dtype and names or formats are specified then use dtype if it is
> of the form ('name', 'format') or use one of the previous cases.
>

This seems to be the case for defaultfmt,

s.seek(0)
data = np.genfromtxt(s,
dtype=[('var1',int),('var2',int),('var3',float)], delimiter=",",
defaultfmt="VAR%i")

In [99]: data
Out[99]:
array((1, 2, 3.0),
      dtype=[('var1', '<i8'), ('var2', '<i8'), ('var3', '<f8')])

But if names is specified, then it's never ignored

s.seek(0)
data = np.genfromtxt(s,
dtype=[('var1',int),('var2',int),('var3',float)], delimiter=",",
names=['VAR1','VAR2','VAR3'])

In [102]: data
Out[102]:
array((1, 2, 3.0),
      dtype=[('VAR1', '<i8'), ('VAR2', '<i8'), ('VAR3', '<f8')])

> When dtype is None this implies format is None so the format is obtained
> from the data. If names is not True then the names are either from the
> argument or default values.
>

Well, genfromtxt returns plain arrays too, so if Names is not True or
an argument, then we can't give default values.  I think defaultfmt
should have a True argument as well, that way you can return a
structured array with f0, f1, f2 as the names if that's what you want.

> If names argument is True then the names should be read from the data
> and one of the previous cases apply.
>

It's a bit confusing to think of data type "formats" and have the
defaultfmt, perhaps it should be defaultnm?

So in sum, I think we should maybe have a True argument for
defaultfmt, maybe change the name to defaultnm to avoid confusion, and
have it so the easy dtype construction works with defaultfmt.  I will
comment on the open tickets.

Anything I missed?

Skipper



More information about the SciPy-User mailing list