[Numpy-discussion] read not byte aligned records

aymeric.rateau at gmail.com aymeric.rateau at gmail.com
Tue May 5 07:07:42 EDT 2015


Hi,
To answer Jerome (I hope), data is sometime spread on bytes shared by other data in the whole record. 10 bits was an example, sometimes, 24, 2, 8, 7 etc. all combined including some padding between them. I am not sure to have understood...

To Nathaniel, yes indeed I could read the records in big/long bytes and apply right_shift and bitwise_and functions to extract each channels. I am a bit afraid of performance though.

I am currently using bitstring module which is doing exactly this bits handling. It is implemented in both pure python and cython.
I tried to use the pure python and performance drawback compared to byte aligned data is around 2-3x for similar file sizes.
--> I will try with bitstring's cython implementation.
--> I will also try the way using right_shift and bitwise_and
Best will win but at least I am sure I am not missing any trick or optimisation and I am in the right direction from your answers.
Thanks !
Regards
Aymeric


5 mai 2015 08:15 "Nathaniel Smith" <njs at pobox.com> a écrit:
> On Mon, May 4, 2015 at 10:21 PM, Jerome Kieffer <Jerome.Kieffer at esrf.fr> wrote:
> 
>> Hi,
>> If you want to play with 10 bits data-blocks, read 5 bytes and work with 4 entries at a time...
> 
> NumPy arrays don't have any support for sub-byte alignment. So if you
> want to handle such data, you either need to write some manual
> packing/unpacking code (using bitshift operators, or perhaps
> np.unpackbits, or whatever), or use another library designed for doing
> this. You may find Cython useful to write the core packing/unpacking,
> since bit-by-bit processing in a for loop is not something that
> CPython is super well suited to.
> 
> Good luck,
> -n
> 
> --
> Nathaniel J. Smith -- http://vorpus.org
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list