[Numpy-discussion] Forbidden charcter in the "names" argument of genfromtxt?

Mon Feb 20 13:35:46 EST 2012

On Sat, Feb 18, 2012 at 8:12 PM, Adam Hughes <hugadams at gwmail.gwu.edu> wrote:
> Hey everyone,
>
> I have timeseries data in which the column label is simply a filename from
> which the original data was taken.  Here's some sample data:
>
> name1.txt  name2.txt  name3.txt
> 32              34            953
> 32              03            402
>
> I've noticed that the standard genfromtxt() method works great; however, the
> names aren't written correctly.  That is, if I use the command:
>
> print data['name1.txt']
>
> Nothing happens.
>
> However, when I remove the file extension, Eg:
>
> name1  name2  name3
> 32              34            953
> 32              03            402
>
> Then print data['name1'] return (32, 32) as expected.  It seems that the
> period in the name isn't compatible with the genfromtxt() names attribute.
> Is there a workaround, or do I need to restructure my program to get the
> extension removed?  I'd rather not do this if possible for reasons that
> aren't important for the discussion at hand.

It looks like the period is just getting stripped out of the names:

In [1]: import numpy as N

In [2]: N.genfromtxt('sample.txt', names=True)
Out[2]:
array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)],
      dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')])

Interestingly, this still happens if you supply the names manually:

In [17]: def reader(filename):
   ....:     infile = open(filename, 'r')
   ....:     names = infile.readline().split()
   ....:     data = N.genfromtxt(infile, names=names)
   ....:     infile.close()
   ....:     return data
   ....:

In [20]: data = reader('sample.txt')

In [21]: data
Out[21]:
array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)],
      dtype=[('name1txt', '<f8'), ('name2txt', '<f8'), ('name3txt', '<f8')])

What you can do is reset the names after genfromtxt is through with it, though:

In [34]: def reader(filename):
   ....:     infile = open(filename, 'r')
   ....:     names = infile.readline().split()
   ....:     infile.close()
   ....:     data = N.genfromtxt(filename, names=True)
   ....:     data.dtype.names = names
   ....:     return data
   ....:

In [35]: data = reader('sample.txt')

In [36]: data
Out[36]:
array([(32.0, 34.0, 954.0), (32.0, 3.0, 402.0)],
      dtype=[('name1.txt', '<f8'), ('name2.txt', '<f8'), ('name3.txt', '<f8')])

Be warned, I don't know why the period is getting stripped; there may
be a good reason, and adding it in might cause problems.

~Brett