[Numpy-discussion] np.memmap and memory usage

Wed Jul 1 11:22:50 EDT 2009

A Wednesday 01 July 2009 15:04:08 Francesc Alted escrigué:
> However, you can still speed-up out-of-core computations by using the
> recently introduced tables.Expr class (PyTables 2.2b1, see [2]), which uses
> a combination of the Numexpr [3] and PyTables advanced computing
> capabilities:
>
>         f = tb.openFile(filename+".h5", "r+")
>         data = f.root.data
>         expr = tb.Expr("where(data<imin, imin, data)")
>         expr.setOutput(data)
>         expr.eval()
>         expr = tb.Expr("where(data>imax, imax, data)")
>         expr.setOutput(data)
>         expr.eval()
>         f.close()
>
> and the timings for this venue are:
>
> Using tables.Expr
> Time creating data file: 2.393
> Time processing data file: 18.25
>
> which is around a 75% faster than a pure memmap/PyTables approach.

Ops, I suddenly realized that the above can be further accelerated by 
combining both expressions into a nested one.  Something like:

        f = tb.openFile(filename+".h5", "r+")
        data = f.root.data
        # Complex expression that spawns several lines follows
        expr = tb.Expr("""
where(
    where(data<imin, imin, data)>imax,
    imax, data)
""")
        expr.setOutput(data)
        expr.eval()
        f.close()

With this change, the computation time is now:

Using tables.Expr
Time creating data file: 2.18
Time processing data file: 10.992

which represents another 65% of improvement over the version using two 
expressions (and 3x faster than the numpy.memmap version).

> Further, if your data is compressible, you can probably achieve additional
> speed-ups by using a fast compressor (like LZO, which is supported by
> PyTables right out-of-the-box).

As I was curious, I've tried activating the LZO compressor.  Here are the 
results:

Using tables.Expr
Time creating data file: 3.123
Time processing data file: 12.533

Mmh, contrarily to my expectations, this hasn't accelerated the computations.  
My guess is that data being very simple and synthetic, the compression ratio 
is very high (200x), forcing the compressor/uncompressor to do a lot of work 
here.  However, with real-life data the speed could effectively improve.  
OTOH, using a faster compressor could be very advantageous here too :)

Cheers,

-- 
Francesc Alted
-------------- next part --------------
A non-text attachment was scrubbed...
Name: memmap-tables-Expr2.py
Type: text/x-python
Size: 2243 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20090701/64d9b939/attachment.py>