Bug in timsort!?

Chris Angelico rosuav at gmail.com
Wed Feb 25 09:33:54 EST 2015


On Thu, Feb 26, 2015 at 12:58 AM, Sturla Molden <sturla.molden at gmail.com> wrote:
> This is awful. It is broken for arrays longer than 2**49 elements. With 8
> bytes per PyObject* pointer this is >4096 terabytes of RAM. I don't see how
> we can fix this in time.

It's even worse than that. Unless you have a list of 2**49 references
to the same few objects, you're going to need to have some actual
content for each one. The absolute best you could do is to sort
integers, which would take 32 bytes each [1]; if you're sorting
strings, they'll be 90 bytes each, so the integers are our best bet.
So add another *five* powers of two to the RAM requirements.

Also, who has that many elements to sort, and wouldn't do better to
stick 'em into an indexed database of some description? I mean, good
to know that this'll be fixed, but it really isn't likely to come up
any time soon.

ChrisA

[1] Calculated using Python 3.5 amd64 Linux, your figures may vary.
Small integers (less than 2**30) take only 28 bytes, but that's
dwarfed by the rest of them taking 32.



More information about the Python-list mailing list