[SciPy-Dev] Out of Core Sparse Matrices

Aidan Macdonald aidan at brightcloud.com
Thu May 28 10:07:45 EDT 2015


Hi,

I push one piece of the code here
<https://github.com/aidan-plenert-macdonald/scipy/tree/master/scipy/dsparse>.
It is the PYX file. Stephan Hoyer recommended that I push this into he Dask
project. I looked at their code and it looks like it is more of what my
work is. I think I will talk with him. My company is looking at building a
good distributed computing framework for big data machine learning
purposes. Most of the old code is in C++, but I am porting it over into
Python for better maintainability.

As seen in the code, I simply use the SciPy existing source for the
dok_matrix (easier than rewriting one) and provide an out of core
dictionary using Sqlite. I am looking into not using Sqlite and doing a
sort of memmap interface as that is what our C++ would do, but I am unsure
of the speed/complexity/maintainability benefit.

At the end of the day, all the SciPy sparse matrix tests should work with
minimal changes (adding file names). It is PYX because I was compiling with
Cython. There is currently minimal speed gain from compilation, but I was
going to go through and optimize later.

Thank you,

Aidan Macdonald
805 418 0174
aidan at brightcloud.com
aidan.plenert.macdonald at gmail.com

On Wed, May 27, 2015 at 11:25 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Wed, May 27, 2015 at 5:48 PM, Aidan Macdonald <aidan at brightcloud.com>
> wrote:
> >
> > Hi,
> >
> > I work for Brightcloud and part of my work required me to write Out of
> Core Sparse Matrices. I was thinking of submitting these to Scipy as it
> currently has Sparse Matrices, but not out of core.
> >
> > I was wondering if this code would be a desired addition to SciPy. Also,
> currently it uses the Python Sqlite3 library. Is it okay to use the Sqlite3
> package?
>
> Hi Aidan,
>
> I think we'd be able to give you better advice/suggestions if you
> could give us a pointer to the code and/or docs, to get a sense of
> what kind of general approach, public API, dependencies, etc. that
> you're talking about?
>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20150528/feb03597/attachment.html>


More information about the SciPy-Dev mailing list