[Numpy-discussion] GSOC 2013

Tue Mar 5 12:33:56 EST 2013

>> 5. Currently dtypes are limited to a set of fixed types, or combinations
>> of these types.  You can't have, say, a 48 bit float or a 1-bit bool.  This
>> project would be to allow users to create entirely new, non-standard dtypes
>> based on simple rules, such as specifying the length of the sign, length of
>> the exponent, and length of the mantissa for a custom floating-point number.
>> Hopefully this would mostly be used for reading in non-standard data and not
>> used that often, but for some situations it could be useful for storing data
>> too (such as large amounts of boolean data, or genetic code which can be
>> stored in 2 bits and is often very large).
>
>
> I second this general idea. Simply having a pair of packbits/unpackbits
> functions that could work with 2 and 4 bit uints would make my life easier.
> If it were possible to have an array of dtype 'uint4' that used half the
> space of a 'uint8', but could have ufuncs an the like ran on it, it would be
> pure bliss. Not that I'm complaining, but a man can dream...

I also think this would make a great addition to NumPy.  People may
even be able to save some work by leveraging the HDF5 code base; the
HDF5 guys have piles and piles of carefully tested C code for exactly
this purpose; converting between the common IEEE float sizes and those
with user-specified mantissa/exponents; 1, 2, 3 bit etc. integers and
the like.  It's all under a BSD-compatible license.  You'd have to
replace the bits which talk to the HDF5 type description system, but
it might be a good place to start.

Andrew