Strange array.array performance

Maxim Khitrov mkhitrov at gmail.com
Thu Feb 19 21:58:42 EST 2009


On Thu, Feb 19, 2009 at 9:15 PM, John Machin <sjmachin at lexicon.net> wrote:
> On Feb 20, 6:53 am, Maxim Khitrov <mkhit... at gmail.com> wrote:
>> On Thu, Feb 19, 2009 at 2:35 PM, Robert Kern <robert.k... at gmail.com> wrote:
>> > On 2009-02-19 12:52, Maxim Khitrov wrote:
>>
>> >> Hello all,
>>
>> >> I'm currently writing a Python<->  MATLAB interface with ctypes and
>> >> array.array class, using which I'll need to push large amounts of data
>> >> to MATLAB.
>>
>> > Have you taken a look at mlabwrap?
>>
>> >  http://mlabwrap.sourceforge.net/
>>
>> > At the very least, you will probably want to use numpy arrays instead of
>> > array.array.
>>
>> >  http://numpy.scipy.org/
>>
>> I have, but numpy is not currently available for python 2.6, which is
>> what I need for some other features, and I'm trying to keep the
>> dependencies down in any case. Mlabwrap description doesn't mention if
>> it is thread-safe, and that's another one of my requirements.
>>
>> The only feature that I'm missing with array.array is the ability to
>> quickly pre-allocate large chunks of memory. To do that right now I'm
>> using array('d', (0,) * size).
>
> It would go somewhat faster if you gave it a float instead of an int.
>
>> It would be nice if array accepted an
>> int as the second argument indicating how much memory to allocate and
>> initialize to 0.
>
> While you're waiting for that to happen, you'll have to use the
> fromstring trick, or another gimmick that is faster and is likely not
> to use an extra temp 8Mb for a 1M-element array, as I presume the
> fromstring does.
>
> [Python 2.6.1 on Windows XP SP3]
> [Processor: x86 Family 15 Model 36 Stepping 2 AuthenticAMD ~1994 Mhz]
>
> C:\junk>\python26\python -mtimeit -s"from array import array" "x=array
> ('d',(0,)*
> 1000000)"
> 10 loops, best of 3: 199 msec per loop
>
> C:\junk>\python26\python -mtimeit -s"from array import array" "x=array
> ('d',(0.,)*1000000)"
> 10 loops, best of 3: 158 msec per loop
>
> C:\junk>\python26\python -mtimeit -s"from array import array" "x=array
> ('d');x.fromstring('\0'*8*1000000)"
> 10 loops, best of 3: 36 msec per loop
>
> C:\junk>\python26\python -mtimeit -s"from array import array" "x=array
> ('d','\0'*8*1000000)"
> 10 loops, best of 3: 35.7 msec per loop
>
> C:\junk>\python26\python -mtimeit -s"from array import array" "array
> ('d',(0.,))*1000000"
> 10 loops, best of 3: 19.5 msec per loop

Interesting, though I'm not able to replicate that last outcome. The
string method is still the fastest on my machine. Furthermore, it
looks like the order in which you do the multiplication also matters -
(8 * size * '\0') is faster than ('\0' * 8 * size). Here is my test
and outcome:

---
from array import array
from timeit import repeat

print repeat(lambda: array('d', (0,) * 100000), number = 100)
print repeat(lambda: array('d', (0.0,) * 100000), number = 100)
print repeat(lambda: array('d', (0.0,)) * 100000, number = 100)
print repeat(lambda: array('d', '\0' * 100000 * 8), number = 100)
print repeat(lambda: array('d', '\0' * 8 * 100000), number = 100)
print repeat(lambda: array('d', 8 * 100000 * '\0'), number = 100)
---

[0.91048107424534941, 0.88766983642377162, 0.88312824645684618]
[0.72164595848486179, 0.72038338197219343, 0.72346024633711981]
[0.10763947529894136, 0.1047547164728595, 0.10461521722863232]
[0.05856873793382178, 0.058508825334111947, 0.058361838698573365]
[0.057632016342657799, 0.057521392119007864, 0.057227118035289237]
[0.056006643320014149, 0.056331811311153501, 0.056187433215103333]

The array('d', (0.0,)) * 100000 method is a good compromise between
performance and amount of memory used, so maybe I'll use that instead.

- Max



More information about the Python-list mailing list