Interesting list() un-optimization

Wed Mar 6 22:20:11 EST 2013

I stumbled upon an interesting bit of trivia concerning lists and list 
comprehensions today.

We use mongoengine as a database model layer.  A mongoengine query 
returns an iterable object called a QuerySet.  The "obvious" way to 
create a list of the query results would be:

    my_objects = list(my_query_set)

and, indeed, that works.  But, then I found this code:

   my_objects = [obj for obj in my_query_set]

which seemed a bit silly.  I called over the guy who wrote it and asked 
him why he didn't just write it using list().  I was astounded when it 
turned out there's a good reason!

Apparently, list() has an "optimization" where it calls len() on its 
argument to try and discover the number of items it's going to put into 
the list.  Presumably, list() uses this information to pre-allocate the 
right amount of memory the first time, without any resizing.  If len() 
fails, it falls back to just iterating and resizing as needed.  
Normally, this would be a win.

The problem is, QuerySets have a __len__() method.  Calling it is a lot 
faster than iterating over the whole query set and counting the items, 
but it does result in an additional database query, which is a lot 
slower than the list resizing!  Writing the code as a list comprehension 
prevents list() from trying to optimize when it shouldn't!