[Numpy-discussion] checksum on numpy float array

Francesc Alted faltet at pytables.org
Fri Dec 5 12:42:00 EST 2008


A Friday 05 December 2008, Brennan Williams escrigué:
> Robert Kern wrote:
> > On Thu, Dec 4, 2008 at 18:54, Brennan Williams
> >
> > <brennan.williams at visualreservoir.com> wrote:
> >> Thanks
> >>
> >> josef.pktd at gmail.com wrote:
> >>> I didn't check what this does behind the scenes, but try this
> >>
> >> import hashlib #standard python library
> >> import numpy as np
> >>
> >>> m = hashlib.md5()
> >>> m.update(np.array(range(100)))
> >>> m.update(np.array(range(200)))
> >
> > I would recommend doing this on the strings before you make arrays
> > from them. You don't know if the network cut out in the middle of
> > an 8-byte double.
> >
> > Of course, sending the lengths and other metadata first, then the
> > data would let you check without needing to do expensivish hashes
> > or checksums. If truncation is your problem rather than corruption,
> > then that would be sufficient. You may also consider using the NPY
> > format in numpy 1.2 to implement that.
>
> Thanks for the ideas. I'm definitely going to add some more basic
> checks on lengths etc as well.
> Unfortunately the problem is happening at a client site  so  (a) I
> can't reproduce it and (b) most of the
> time they can't reproduce it either. This is a Windows Python app
> running on Citrix reading/writing data
> to a Linux networked drive.

Another possibility would be to use HDF5 as a data container.  It 
supports the fletcher32 filter [1] which basically computes a chuksum 
for evey data chunk written to disk and then always check that the data 
read satifies the checksum kept on-disk.  So, if the HDF5 layer doesn't 
complain, you are basically safe.

There are at least two usable HDF5 interfaces for Python and NumPy: 
PyTables[2] and h5py [3].  PyTables does have support for that right 
out-of-the-box.  Not sure about h5py though (a quick search in docs 
doesn't reveal nothing).

[1] http://rfc.sunsite.dk/rfc/rfc1071.html
[2] http://www.pytables.org
[3] http://h5py.alfven.org

Hope it helps,

-- 
Francesc Alted



More information about the NumPy-Discussion mailing list