[Tutor] Acceleration of long lists

Mats Wichmann mats at wichmann.us
Thu Jul 29 23:23:48 EDT 2021


On July 29, 2021 5:06:56 PM MDT, Alan Gauld via Tutor <tutor at python.org> wrote:
>On 29/07/2021 19:02, Jan Kenin wrote:
>
>> Often I append instances of objects in list-objects. Then I realize that
>> the progress of my program slows down, when the list becomes larger and
>> larger, e.g. larger than 3000.
>
>3000 is nothing to a modern computer.
>Remember the list is not holding your objects, it is simply holding a
>reference to each object. When you append an object you are only adding
>a new reference - a few bytes.

Once upon a time, if you were adding complex objects, things did slow down because the garbage collector scanned the list during append, and that got slower as the list grew (not for simple objects like ints, strings, etc. though), and you could usually improve things by turning it off before your append loop and on again afterwards.  I dont think that's been an issue for something like 15 years now.
 
>Also, I believe that in the implementation details Python
>allocates memory for lists in chunks so you only actually
>create new memory when an existing chunk runs out, not
>for every append operation. (That's certainly how I would
>build it!)

in fact cpython doubles the size each time is has to allocate more so the frequency of getting a new chunk goes down.

>>  How can I accerlerate this? Perhaps it is
>> a question of memory. Can I initialize the list in order to do the
>> memory allocation only once?

You dont mention numpy so I assume it's not involved - I have seen mentions of numpy arrays having this problem and indeed working better if you preallocate (numpy.empty(length)). That might be hooey, I have not experimented with this.

>No, that's a memory optimisation that's in the hands of the
>implementers and indeed may well work differently
>in different versions of Python (CPython, Jython, IronPython etc)
>
>The solution with any kind of performance problem is to measure
>to find out what is slowing things down. Have you tried using
>the profiler to see which functions are taking the time?
>
>Also, what kind of slow-down do you see? fractions of a second?
>seconds?, minutes? And is it only certain operations? If an operation
>is slow you probably won't notice it for a single object, but
>once you get into thousands repeating it then the delay
>becomes obvious. Again profiling is your friend. But without
>much more details (and ideally code) we can't be more specific.
>
>Also, if your code is doing embedded loops on these lists then
>that can slow things dramatically. And in Python it's easy
>to introduce loops without realizing - for example using
>'in' tests causes a loop. If you repeat the same 'in' test
>multiple times you will loop over the entire list each time.
>Also searching for items might well be faster if you use
>a dictionary rather than a loop. But without sight of your
>code we can only make generic suggestions.
>

Here's a trick to try for fun, no idea if it will show a difference in your case since there could be something else going on:  assuming your objects are unique and hashable, add them to a set instead, and then at the end, if you need a list, convert to a list.


-- 
Sent from a mobile device with K-9 Mail. Please excuse my brevity.


More information about the Tutor mailing list