[SciPy-Dev] Seeking help/ advice for applying functions

J P jpscipy at gmail.com
Fri Mar 12 17:18:05 EST 2010


If your data is that sparse (1 per 100 nonzero elements) then your biggest
computational savings would probably be to use sparse matrices.

On Tue, Mar 9, 2010 at 11:12 AM, Anne Archibald
<peridot.faceted at gmail.com>wrote:

> On 9 March 2010 09:59, eat <e.antero.tammi at gmail.com> wrote:
> > Robert Kern <robert.kern <at> gmail.com> writes:
> >
> >>
> >> Your example is not very clear. Can you write a less cryptic one with
> >> informative variable names and perhaps some comments about what each
> >> part is doing?
> >>
> >
> > """
> > Hi,
> >
> > I have tried to clarify my code. First part is the relevant one, the rest
> is
> > there just to provide some context to run some tests.
>
> The short answer is that no, there's no way to optimize what you're doing.
>
> The long answer is: when numpy and scipy are fast, they are fast
> because they avoid running python code: if you add two arrays, there's
> only one line of python code, and all the work is done by loops
> written in C. If your code is calling many different python functions,
> well, since they're python functions, to apply them at all you must
> necessarily execute python code. There goes any potential speed
> advantage. (There may be a convenience advantage; if so, you can look
> into using np.vectorize, which is just a wrapper around a python loop,
> but is convenient.)
>
> That said, I assume you are considering numpy/scipy because you have
> arrays of thousands or more. It also seems unlikely that you actually
> have thousands of different functions (that's an awful lot of source
> code!). So if your "different" functions are actually just a handful
> (or fewer) pieces of actual code, and you are getting your thousands
> of functions by wrapping them up with parameters and local variables,
> well, now there are possibilities. Exactly what possibilities depend
> on what your functions look like - which is one reason Robert Kern
> asked you to clarify your code - but they all boil down to rearranging
> the problem so that it goes back to "few functions, much data", then
> writing the functions in such a way that you can use numpy to apply
> them to thousands or millions of data points at once.
>
> > Also originally I should have posted this to Scipy-User list. Would it be
> > more appropriate to continue the discussion there?
>
> Probably.
>
>
> Anne
>
> > Regards,
> > eat
> > """
> >
> > import numpy as np
> >
> > ## relevant part
> > # expect indicies to be sparse ~1% but hundred of thousands of elements
> > # also functions not necessary builtins and data may be large
> > def how1(functions, data):
> >    """ firsth approach to apply data to functions"""
> >    def how(indicies):
> >        return np.asarray([f(data) for f in functions[indicies]]).T
> >    return how
> >
> >
> > def how2(functions, data):
> >    """ second approach to apply data to functions"""
> >    def rr(data):
> >        """ reverse the 'roles' of data and function"""
> >        def f(g):
> >            return g(data)
> >        return f
> >    daf= rr(data)
> >    def how(indicies):
> >        return np.asarray(map(daf, functions[indicies])).T
> >    return how
> >
> > # as I understand it, how1 and how2 boils down under the hood pretty
> > # much to the same code, so only syntatical differencies, right?
> >
> > # and this is the key question: does there exist a more suitable
> > # scipy/ numpy solution for this situation?
> >
> > # perhaps some kind of special vectorization as below (as pseudo code)?
> > def how3(functions, data):
> >    """ third approach to apply data to functions"""
> >    def vecme(functions, data, indicies):
> >        return functions[indicies](data)
> >    v= np.vectorize(vecme)
> >    def how(indicies):
> >        return np.asarray(v(functions, data, indicies)).T
> >    return how
> >
> > ## end relevant part
> > # rest is just some context where hows could be applied
> > def stream(m, n):
> >    """ mimic some external stream of 0\ 1 indicators"""
> >    np.random.seed(123)
> >    ind= np.asarray(np.random.randint(0, 2, (m, n)), dtype= bool)
> >    for k in xrange(m):
> >        yield ind[k, :]
> >
> > def process(stream, how):
> >    """ consume stream"""
> >    for ind in stream:
> >        yield how(ind)
> >
> > def run(hows, n):
> >    """run the hows"""
> >    for app in hows.keys():
> >        print 'approach:', app
> >        for r in process(stream(3, n), hows[app]):
> >            print np.round(r.squeeze(), 2)
> >
> > if __name__ == '__main__':
> >    # some data and functions only as demonstration
> >    data= np.random.random((3, 1))
> >    fncs= np.asarray([np.sin, np.cos, np.tan, np.sinh, np.cosh, np.tanh])
> >
> >    # produce equivalent results
> >    hows= {'firsth': how1(fncs, data),
> >           'second': how2(fncs, data)}
> > #           'third': how3(fncs, data)}
> >    run(hows, len(fncs))
> >
> >
> > _______________________________________________
> > SciPy-Dev mailing list
> > SciPy-Dev at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-dev
> >
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20100312/d1c9d56e/attachment.html>


More information about the SciPy-Dev mailing list