[Numpy-discussion] odd ascii format and genfromtxt

Fri Feb 26 03:29:26 EST 2010

Ralf Gommers wrote:
> Hi all,
>
> I'm trying to read in data from text files with genfromtxt, and have 
> some trouble figuring out the right combination of keywords. The 
> format is:
>
> ['0\t\t4.000000000000000e+007,0.000000000000000e+000\n',
>  '\t9.860280631554179e-001,-1.902586503306264e-002\n',
>  '\t9.860280631554179e-001,-1.902586503306264e-002']
>
> Note that there are two delimiters, tab and comma. Also, the first 
> line has an extra integer plus tab (this is a repeating pattern). The 
> files are large, there's a lot of them, and they're generated by a 
> binary I can't modify.
>
> Here are some things I've tried:
>
> In [216]: np.genfromtxt('ascii2test.raw', invalid_raise=False)
> Out[216]: array([  0.,  NaN])
>
> In [217]: np.genfromtxt('ascii2test.raw', invalid_raise=False, 
> delimiter=['\t', ','])
> TypeError: cannot perform accumulate with flexible type
>
> In [228]: np.genfromtxt('ascii2test.raw', delimiter=['\t', ','], 
> dtype=[('intvar', '<i8'), ('fltvar', '<f8'), ('fltvar2', '<f8')])
> TypeError: cannot perform accumulate with flexible type
>
>
> Any suggestions?

The 'delimiter' keyword does not accept a list of strings.  If it is a 
list, it must be a list of integers that are the field widths.  In your 
case, that won't work.

You could try fromregex:

-----
In [1]: import numpy as np

In [2]: cat sample.raw
0        4.000e+007,0.00000e+000
    9.8602806e-001,-1.9025e-002
    9.8602806e-001,-1.9025e-002
123        5.0e6,100.0
    10.1,-2.0e-3
    10.2,-2.1e-3

In [3]: a = np.fromregex('sample.raw', '(.*?)\t+(.*),(.*)', 
np.dtype([('extra', 'S8'), ('x', float), ('y', float)]))

In [4]: a
Out[4]:
array([('0', 40000000.0, 0.0), ('', 0.98602805999999998, -0.019025),
       ('', 0.98602805999999998, -0.019025), ('123', 5000000.0, 100.0),
       ('', 10.1, -0.002), ('', 10.199999999999999, 
-0.0020999999999999999)],
      dtype=[('extra', '|S8'), ('x', '<f8'), ('y', '<f8')])

In [5]: a[0]
Out[5]: ('0', 40000000.0, 0.0)

In [6]: a[1]
Out[6]: ('', 0.98602805999999998, -0.019025)

In [7]: a['extra']
Out[7]:
array(['0', '', '', '123', '', ''],
      dtype='|S8')

In [8]: a['y']
Out[8]:
array([  0.00000000e+00,  -1.90250000e-02,  -1.90250000e-02,
         1.00000000e+02,  -2.00000000e-03,  -2.10000000e-03])
-----

Note that the first field of the array is a string, not an integer.  The 
string will be empty in rows that did not have the initial integer.  I 
don't know if that will work for you.

Warren

>
> Thanks,
> Ralf
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>