[Numpy-discussion] Tabular data package

Tue Oct 6 12:31:52 EDT 2009

On Mon, Oct 5, 2009 at 5:22 PM, Elaine Angelino
<elaine.angelino at gmail.com> wrote:
> Hi there,
>
> We are writing to announce the release of "Tabular", a package of Python
> modules for working with tabular data.
>
> Tabular is a package of Python modules for working with tabular data. Its
> main object is the tabarray class, a data structure for holding and
> manipulating tabular data. By putting data into a tabarray object, you’ll
> get a representation of the data that is more flexible and powerful than a
> native Python representation. More specifically, tabarray provides:
>
> -- ultra-fast filtering, selection, and numerical analysis methods, using
> convenient Matlab-style matrix operation syntax
> -- spreadsheet-style operations, including row & column operations, 'sort',
> 'replace', 'aggregate', 'pivot', and 'join'
> -- flexible load and save methods for a variety of file formats, including
> delimited text (CSV), binary, and HTML
> -- helpful inference algorithms for determining formatting parameters and
> data types of input files
> -- support for hierarchical groupings of columns, both as data structures
> and file formats
>
> You can download Tabular from PyPI (http://pypi.python.org/pypi/tabular/) or
> alternatively clone our hg repository from bitbucket
> (http://bitbucket.org/elaine/tabular/).  We also have posted tutorial-style
> Sphinx documentation (http://www.parsemydata.com/tabular/).
>
> The tabarray object is based on the record array object from the Numerical
> Python package (NumPy), and Tabular is built to interface well with NumPy in
> general.  Our intended audience is two-fold: (1) Python users who, though
> they may not be familiar with NumPy, are in need of a way to work with
> tabular data, and (2) NumPy users who would like to do spreadsheet-style
> operations on top of their more "numerical" work.
>
> We hope that some of you find Tabular useful!
>
> Best,
>
> Elaine and Dan

I briefly looked at the sphinx docs and the code. Tabular looks pretty
useful and
the code can be partially read as recipes for working with recarrays
or structured
arrays. Thanks for the choice of license (it makes looking at the code "legal").

I didn't see any explicit nan handling. Are missing values allowed
e.g. in the constructor?

I looked a bit closer at function like tabular.fast.recarrayisin since
I always have problems
with these row operations.
Are these function supposed to work with arbitrary structured arrays?
The tests are only
for a 1d integer arrays.
With floats the default string representation doesn't sort correctly.
Or am I misreading the function?

>>> arr = np.array([6,1,2,1e-13,0.5*1e-14,1,2e25,3,0,7]).view([('',float)]*2)
>>> arr
array([(6.0, 1.0), (2.0, 1e-013), (5e-015, 1.0),
       (2.0000000000000002e+025, 3.0), (0.0, 7.0)],
      dtype=[('f0', '<f8'), ('f1', '<f8')])
>>> np.sort([str(l) for l in arr])
array(['(0.0, 7.0)', '(2.0, 1e-013)', '(2.0000000000000002e+025, 3.0)',
       '(5e-015, 1.0)', '(6.0, 1.0)'],
      dtype='|S30')

Being able to do a searchsorted on rows of an array would be a useful feature
in numpy. Is there a sortable 1d representation of the rows of a 2d float or
mixed type array?

Thanks,

Josef

>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>