[SciPy-user] Initializing COO/CSR matrix before function call

Anne Archibald peridot.faceted at gmail.com
Sun Feb 3 00:03:30 EST 2008


On 02/02/2008, Dinesh B Vadhia <dineshbvadhia at hotmail.com> wrote:

> I'm using a function to load a sparse matrix A using coo_matrix and then to
> transform it into a csr_matrix.  We are testing a bunch of very large sized
> matrices A and hence the use of a function.  In addition, A is available to
> many other functions in the program.
>
> Python says that A has to be defined (or initialized) before sending to the
> load function.  But, doesn't that mean initializing A as 'empty' or
> 'zeroed', both of which impact memory use, defeats the purpose of using coo
> and csr?  I've looked at the Sparse docstring help and cannot see a way out.
>
> Have I missed something?

If I've correctly understood your problem, it is this:

You want to make a sparse matrix A available to your whole program.
The loading is done inside a special-purpose function, call it load().
But when you create A inside load(), it's not visible anywhere else.
What are you to do?

The most direct (though not necessarily the best) way to do what
you're describing is to make A a global variable. That is, if you
mention "A" anywhere in the whole program, it refers to *this* A that
you just loaded. In most languages, declarations are used to indicate
global variables. Python has somewhat complicated rules for this, but
the easiest way to do what you want is:

def load():
    global A
    A = # whatever

Now, if in some other function you write

def frob(x):
    return A*x

python will deduce that A here refers to the global A. If, however,
you *assign* to A:

def fiddle():
    A = 2*A

python will assume that A is a local variable in fiddle() and die
because you have used it before assigning a value to it. To tell
python that it's a global variable, use global again:

def fiddle():
    global A
    A = 2*A

It never hurts to mark A as global in this way.


I should say, though, that setting a global variable like this can be
trouble. It means (for example), that when a function is run, what
happens depends on the value A has, not just the values that get
passed to the function. This can make functions spontaneously do
something surprising if A accidentally gets modified, and it can be
very difficult to track down where the problem is. The fact that there
is only one A for the whole program can also be a major headache if
you want to expand your program or use it as a tool from within
another python program.

The classical way to get rid of this is to explicitly pass A as a
parameter to functions that need to use it. If this grows cumbersome,
a common solution is to incorporate A (and possibly some other
supporting data) into an object, and make functions that need to use A
a method.

These problems, and the techniques to solve them, are not
numpy-specific; if you do some looking around for information on
python and global variables, you should find much more information
than I gave here.

Good luck!
Anne



More information about the SciPy-User mailing list