[SciPy-User] Suggestion for numpy.genfromtxt documentation
Skipper Seabold
jsseabold at gmail.com
Fri Oct 9 10:21:12 EDT 2009
On Wed, Oct 7, 2009 at 3:20 PM, Bruce Southey <bsouthey at gmail.com> wrote:
> On 10/07/2009 10:52 AM, Skipper Seabold wrote:
>> On Wed, Oct 7, 2009 at 11:25 AM, Dharhas Pothina
>> <Dharhas.Pothina at twdb.state.tx.us> wrote:
>>
>>> Hi,
>>>
>>> It took me a while and a lot of trial and error to work out why this didn't work as expected.
>>>
>>> data = np.genfromtxt(fname,usecols=(2,3,4),names='x,y,z')
>>>
>>> this command works and does not return any warnings or errors, but returns an numpy array with no field names. If you use:
>>>
>>> data = np.genfromtxt(fname,usecols=(2,3,4),dtype=None,names='x,y,z')
>>>
>>> then the command does what I expect it to and returns a structured numpy array with field names. So essentially, the 'names' argument doesn't not work unless you also specify the 'dtype' argument.
>>>
> What did you actually expect?
> It would be very informative if you could provide a simple example of
> this for testing.
>
> There are many combinations of arguments so not all have been tested and
> it is not always clear what the expected behavior should be.
>
>>> I think, it would be less confusing to new users to either have this explicitly mentioned in the documentation string for the genfromtxt 'names' argument or to have the function default to 'dtype=None' if the 'names' argument is specified without specifying the 'dtype' argument.
>>>
>>> - dharhas
>>>
>> I came across this behavior recently and agree with you. There is a
>> patch in the works for this.
>>
>> See this thread: http://thread.gmane.org/gmane.comp.python.numeric.general/33479
>>
>> And this ticket: http://projects.scipy.org/numpy/ticket/1252
>>
>> Cheers,
>>
>> Skipper
>>
>
> From the numpy help, there is this example:
> data = np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
> ('mystring','S5')], delimiter=",")
>
> It does not help that the dtype of structured arrays also includes the
> actual name. So I do not think we can use dtype argument without using
> the combination of dtype and name. Perhaps if dtype is split into names
> and formats so that dtype=('name', 'format').
>
> In some sense you are suggesting that we should have something like:
>
With the defaultfmt keyword added and the new changes here is the
current state of things.
from StringIO import StringIO
import numpy as np
s = StringIO("1,2,3.0")
> Ignore the use of None and True for dtype and names arguments:
> i) If only dtype is only specified then use the specified dtype and add
> default names such as col1, col2,... if necessary
>
This gives a plain array, so no default names are used.
data = np.genfromtxt(s, delimiter=",") # dtype=float
In [54]: data
Out[54]: array([ 1., 2., 3.])
If default names are specified then it doesn't seem to pick them up as
of right now.
s.seek(0)
data = np.genfromtxt(s, delimiter=",", defaultfmt="Var%i")
In [79]: data
Out[79]: array([ 1., 2., 3.])
> ii) If names is only specified then contruct the dtype as ('name',
> 'default format')
s.seek(0)
data = np.genfromtxt(s, delimiter=",", names=['var1','var2','var3'])
#dtype = float
In [57]: data
Out[57]:
array((1.0, 2.0, 3.0),
dtype=[('var1', '<f8'), ('var2', '<f8'), ('var3', '<f8')])
> iii) If formats is only specified then construct the dtype as ('default
> name', 'format')
This doesn't seem to work with the new easy dtype as noted above.
But this does
data = np.genfromtxt(s, delimiter=",", dtype=(int,int,float),
defaultfmt="var%i")
In [72]: data
Out[72]:
array((1, 2, 3.0),
dtype=[('var0', '<i8'), ('var1', '<i8'), ('var2', '<f8')])
> iv) If only names and formats are only specified then construct the
> dtype as ('name', 'format')
>
So I think this means,
s.seek(0)
data = np.genfromtxt(s, delimiter=",", dtype=(int,int,float),
names=['var1','var2','var3'])
In [86]: data
Out[86]:
array((1, 2, 3.0),
dtype=[('var1', '<i8'), ('var2', '<i8'), ('var3', '<f8')])
> v) If no dtype, names and formats are only specified then construct the
> dtype as ('default name', 'default format')
>
Same case as above I think where
s.seek(0)
data = np.genfromtxt(s, delimiter=",", defaultfmt="var%i")
doesn't work as "expected" to zip float (the default format) with the
default name, specified by defaultfmt.
> vi) If dtype and names or formats are specified then use dtype if it is
> of the form ('name', 'format') or use one of the previous cases.
>
This seems to be the case for defaultfmt,
s.seek(0)
data = np.genfromtxt(s,
dtype=[('var1',int),('var2',int),('var3',float)], delimiter=",",
defaultfmt="VAR%i")
In [99]: data
Out[99]:
array((1, 2, 3.0),
dtype=[('var1', '<i8'), ('var2', '<i8'), ('var3', '<f8')])
But if names is specified, then it's never ignored
s.seek(0)
data = np.genfromtxt(s,
dtype=[('var1',int),('var2',int),('var3',float)], delimiter=",",
names=['VAR1','VAR2','VAR3'])
In [102]: data
Out[102]:
array((1, 2, 3.0),
dtype=[('VAR1', '<i8'), ('VAR2', '<i8'), ('VAR3', '<f8')])
> When dtype is None this implies format is None so the format is obtained
> from the data. If names is not True then the names are either from the
> argument or default values.
>
Well, genfromtxt returns plain arrays too, so if Names is not True or
an argument, then we can't give default values. I think defaultfmt
should have a True argument as well, that way you can return a
structured array with f0, f1, f2 as the names if that's what you want.
> If names argument is True then the names should be read from the data
> and one of the previous cases apply.
>
It's a bit confusing to think of data type "formats" and have the
defaultfmt, perhaps it should be defaultnm?
So in sum, I think we should maybe have a True argument for
defaultfmt, maybe change the name to defaultnm to avoid confusion, and
have it so the easy dtype construction works with defaultfmt. I will
comment on the open tickets.
Anything I missed?
Skipper
More information about the SciPy-User
mailing list