[2.5.1.1/dictionary] Change sorting order?

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Fri Jan 22 11:17:12 EST 2010


On Fri, 22 Jan 2010 09:49:32 -0500, Dave Angel wrote:

> Seems to me the other solutions I've seen so far are more complex than
> needed.  I figure you either want an unordered list, in which case you
> could use random.shuffle(), or you want a list that's sorted, but starts
> somewhere in the middle, at an arbitrary place, goes to the end, and
> wraps back to the beginning.  So use random.randint() to choose an index
> within the list, and concatenate two slices of the list, based on that
> index in reverse order.
> 
> And don't bother generating it each minute, but simply generate it each
> time you need it.

According to the OP's stated requirements, he needs it every minute.

Personally, from a usability perspective, I think a refresh of a 
(potentially huge) list of users every minute would be horrible. I think 
the whole design needs to be rethought, but that's not my decision.


> I doubt if the cost of generating it is much
> different than the cost of checking the time.

Checking the time is quite fast -- it's a simple system call. Slicing and 
copying a list containing potentially tens or hundreds of thousands of 
names is anything but cheap!

>>> from timeit import Timer
>>> t1 = Timer('x = time.time()', 'import time')
>>> t2 = Timer('y = x[:1000] + x[1000:]', 'x = [None]*100000')
>>>
>>> t1.timeit(1000)
0.0025529861450195312
>>> t2.timeit(1000)
3.0900199413299561

Slicing is more than 1000 times slower than checking the time. If you 
think I'm being unfair, that the OP's application will never have 100,000 
names, try it with 1000 names:

>>> t3 = Timer('y = x[:1000] + x[1000:]', 'x = [None]*1000')
>>> t3.timeit(1000)
0.041487932205200195

That's 20 times slower than checking the time.

That's the advantage of my suggestion: the only slicing done is the tiny 
master list, which means copying 27 pointers. Easy and fast, and it 
doesn't matter whether you have one user or one million, that operation 
will take exactly the same amount of time. Your suggestion to keep all 
the users in one giant list, and slice it repeatedly, scales terribly.


> I guess there's a third possibility, that you don't want some of the G
> names at the beginning, and some at the end.  In that case, I'd generate
> the random index as above, then increment it till the first character of
> the item changes.  Use that index as your split point.

Another solution that doesn't scale terrible well. (Although not as bad 
as the previous one.)

And what's with the random index thing? Why are you throwing away 
perfectly good information every time you want to shift letters? Your 
solution is something like this:

You have a book opened at the start of Chapter 11, and you now want to 
jump to the start of Chapter 12.

(1) Open the book at a random page.
(2) Is it the start of Chapter 12?
    If yes, you're done.
    If no, turn to the next page. If you reach the end of 
    the book, start again at the beginning.
(3) Go to step 2.

There are search techniques that start with a random index, but they 
don't advance one position at a time! (E.g. binary search, or hunt-and-
bisect search, or similar.) But why search each time when you can keep an 
index to the start of each chapter and jump straight there in constant 
time?




-- 
Steven



More information about the Python-list mailing list