[SciPy-dev] Ideas for scipy.sparse?

Brian Granger ellisonbg.net at gmail.com
Sat Apr 12 13:21:39 EDT 2008


>  Can you demonstrate a real-world scenario where the overhead is
>  noticeable?  In all cases that I'm aware of the number of "slow"
>  operations is O(1).  OTOH the "fast" code always does at least O(nnz)
>  operations.

Sure.  Indexing (getting items) is O(1), but if I do it N=some big
number times, it is O(N*1) = O(N).  If I have to do all the indexing
through python (__getitem__) I have the overhead of calling a python
function all N times.  This is the same reason we all encourage people
to try to avoid writing for i in range(N) style loops to do things
with numpy arrays.

>  You are right about the lil and dok formats.  They are currently too
>  slow for large-scale problems.  You can have your way with them :)

Thanks, I will probably use them as templates to begin seeing i I can
make things faster with cython.  This big thing I need right now is
fast get/set for individual items (random access).

>  >  Think of it in terms of layers (I will use the csr format as an example):
>  >
>  >  sparse/csr.py        => top level python class that wraps the lower layers.
>  >  sparsetools/csr.py =>  thin swig generated python wrapper that
>  >  introduces overhead
>  >  sparsetools/csr.h  =>     fast c++ code
>  >
>  >  Using swig forces me to deal with two layers of python code before I
>  >  can talk to the fast c++ code.  The problem with this is that I want
>  >  to call the top level layer from C!  This would be like NumPy not
>  >  having a C API!
>  >
>  >  The cython stack would look like this
>  >
>  >  sparse/csr.pyx        => top level python class that is a C extension
>  >  type and can be called from C or python
>  >  sparsetools/csr.h  =>     fast c++ code
>  >
>  >  In that case, I could write my extension code in C and talk directly
>  >  to the C interface without any overhead of multiple layers of Python.
>
>  I understand your point, but it's not immediately clear to me that the
>  SWIG-induced overhead is actually troublesome.  For frequent calls to
>  simple operations I see the purpose.  However, each of functions in
>  sparsetools does a substantial amount of work per call.

Yep, it is really just the small stuff that is not being done in
sparsetools, but that might need to be done many times that could be
optimized.

>  Also, what code that would want to interface directly with csr.h
>  wouldn't live in sparsetools itself?

Basically, I am thinking of a public C-API for doing things with the
sparse arrays.  A true C extension type for the sparse array class.
That could go into sparsetools, but I think it will be easier to
write/maintain in cython.

Cheers,

Brian

>  --
>
>
> Nathan Bell wnbell at gmail.com
>  http://graphics.cs.uiuc.edu/~wnbell/
>  _______________________________________________
>  Scipy-dev mailing list
>  Scipy-dev at scipy.org
>  http://projects.scipy.org/mailman/listinfo/scipy-dev
>



More information about the SciPy-Dev mailing list