[Numpy-discussion] A numpy accumulator...

Gökhan Sever gokhansever at gmail.com
Sat Oct 3 12:04:36 EDT 2009


On Sat, Oct 3, 2009 at 2:26 AM, Christopher Barker <Chris.Barker at noaa.gov>wrote:

> Hasi all,
>
> This idea was  inspired by a discussion at SciPY, in which we spent a
> LOT of time during the numpy tutorial talking about how to accumulate
> values in an array when you don't know how big the array needs to be
> when you start.
>
> The "standard practice" is to accumulate in a python list, then convert
> the final result into an array. This is a good idea because Python lists
> are standard, well tested, efficient, etc.
>
> However, as was pointed out in that lengthy discussion, if what you are
> doing is accumulating is a whole bunch of numbers (ints, floats,
> whatever), or particularly if you need to accumulate a data type that
> plain python doesn't support, there is a lot of overhead involved: a
> python float type is pretty heavyweight. If performance or memory use is
>  important, it might create issues. You can use and array.array, but it
> doesn't support all numpy types, particularly custom dtypes.
>
> I talked about this on the cython list (as someone asked how to do
> accumulate in cython), and a few folks thought it would be useful, so I
> put together a prototype.
>
> What I have in mind is very simple. It would be:
>   - Only 1-d
>   - Support append() and extend() methods
>


Thanks for working on this. This append() method is a very handy for me,
when working with lists. It is exiting to hear that it will be ported to
ndarrays as well.

Any plans for insert() ?



>   - support indexing and slicing
>   - Support any valid numpy dtype
>     - which could even get you pseudo n-d arrays...
>   - maybe it would act like an array in other ways, I'm not so sure.
>     - ufuncs, etc.
>
> It could take the place of using python lists/arrays when you really
> want a numpy array, but don't know how big it will be until you've
> filled it.
>
> The implementation I have now uses a regular numpy array as the
> "buffer". The buffer is re-sized as needed with ndarray.resize(). I've
> enclosed the class, a bunch of tests (This is the first time I've ever
> really done test-driven development, though I wouldn't say that this is
> a complete test suite).
>
> A few notes about this implementation:
>
>  * the name of the class could be better, and so could some of the
> method names.
>
>  * on further thought, I think it could handle n-d arrays, as long as
> you only accumulated along the first index.
>
>  * It could use a bunch more methods
>    - deleting part of eh array
>    - math
>    - probably anything supported by array.array would be good.
>
>  * Robert pointed me to the array.array implimentation to see how it
> expands the buffer as you append. It did tricks to get it to grow fast
> when the array is very small, then eventually to add about 1/16 of the
> used array size to the buffer. I imagine that this would gets used
> because you were likely to have a big array, so I didn't bother and
> start with a buffer at 128 elements, then add 1/4 each time you need to
> expand -- these are both tweakable attributes.
>
>  * I did a little simple profiling, and discovered that it's slower
> than a python list by a factor of more than 2 (for accumulating python
> ints, anyway). With a bit of experimentation, I think that's because of
> a couple factors:
>   - an extra function call -- the append() method needs to then do an
> assignemt to the buffer
>   - Object conversion -- python lists store python objects, so the
> python int can jsut go right in there. with numpy, it needs to be
> converted to a C int first -- a bit if extra overhead.
>
>
>
> --
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
>
> Chris.Barker at noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Gökhan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20091003/8b7e8d4f/attachment.html>


More information about the NumPy-Discussion mailing list