[Numpy-discussion] Efficient reading of binary data

Nicolas Bigaouette nbigaouette at gmail.com
Thu Apr 3 20:14:57 EDT 2008


2008/4/3, Robert Kern <robert.kern at gmail.com>:
>
> On Thu, Apr 3, 2008 at 6:53 PM, Nicolas Bigaouette
> <nbigaouette at gmail.com> wrote:
> > Thanx for the fast response Robert ;)
> >
> > I changed my code to use the slice:
> >  E = data[6::9]It is indeed faster and less eat less memory. Great.
> >
> > Thanx for the endiannes! I knew there was something like this ;) I
> suspect
> > that, in '>f8', "f" means float and "8" means 8 bytes?
>
>
> Yes, and the '>' means big-endian. '<' is little-endian, and '=' is
> native-endian.


I just tested it with a big-endian machine, it does work indeed great :)

> From some benchmarks, I see that the slowest thing is disk access. It can
> > slow the displaying of data from around 1sec (when data is in os cache
> or
> > buffer) to 8sec.
> >
> > So the next step would be to only read the needed data from the binary
> > file... Is it possible to read from a file with a slice? So instead of:
> >
> > data = numpy.fromfile(file=f, dtype=float_dtype, count=9*Stot)
> > E = data[6::9]
> > maybe something like:
> > E = numpy.fromfile(file=f, dtype=float_dtype, count=9*Stot, slice=6::9)
>
>
> Instead of reading using fromfile(), you can try memory-mapping the array.
>
>   from numpy import memmap
>   E = memmap(f, dtype=float_dtype, mode='r')[6::9]
>
> That may or may not help. At least, it should decrease the latency
> before you start pulling out frames.
>
> It did not worked out of the box (memmap() takes the filename and not a
file handler) but anyway, its getting late.

Thanx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080403/b6a4e1da/attachment.html>


More information about the NumPy-Discussion mailing list