[SciPy-User] [Numpy-discussion] [ANN] New open source project for labeled arrays

Keith Goodman kwgoodman at gmail.com
Wed Jan 27 21:57:41 EST 2010


On Wed, Jan 27, 2010 at 6:33 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
> On Wed, Jan 27, 2010 at 9:10 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
>> I recently opened sourced one of my packages. It is a labeled array
>> that I call larry.
>>
>> A two-dimensional larry, for example, contains a 2d NumPy array with
>> labels on each row and column. A larry can have any dimension.
>>
>> Alignment by label is automatic when you add (or subtract, multiply,
>> divide) two larrys.
>>
>> larry has built-in methods such as movingsum, ranking, merge, shuffle,
>> zscore, demean, lag as well as typical NumPy methods like sum, max,
>> std, sign, clip. NaNs are treated as missing data.
>>
>> You can archive larrys in HDF5 format using save and load or using a
>> dictionary-like interface.
>>
>> I'm working towards a 0.1 release. In the meantime, comments,
>> suggestions, critiques are all appreciated.
>>
>> To use larry you need Python and NumPy 1.4 or newer. To save and load
>> larrys in HDF5 format, you need h5py with HDF5 1.8.
>>
>> larry currently contains no extensions, just Python code, so there is
>> nothing to compile. Just save the la package and make sure Python can
>> find it.
>>
>> docs  http://larry.sourceforge.net
>> code  https://launchpad.net/larry
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> Cool! Thanks for releasing.
>
> Looks like you're solving some similar problems to the ones I built
> pandas for (http://pandas.sourceforge.net). I'll have to have a closer
> look at the implementation to see if there are some design
> commonalities we can benefit from.

Yes, I hope we have some overlap so that we can share code.

As far as design goes, larry contains a Numpy array for the data and a
list of lists (one list for each dimension) for the labels. Most of
the larry methods have underlying Numpy array functions that could
easily be used by other projects. There are also functions for
repacking HDF5 archives and for creating intermediate HDF5 Groups when
saving a Dataset inside nested Groups. All this is transparent to the
user but hopefully useful for other projects.



More information about the SciPy-User mailing list