Balanced trees

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sun Mar 9 23:24:13 EDT 2014


On Sun, 09 Mar 2014 18:04:46 -0600, Ian Kelly wrote:

> On Sun, Mar 9, 2014 at 4:08 PM, Dan Stromberg <drsalists at gmail.com>
> wrote:
>> On Sun, Mar 9, 2014 at 2:43 PM, Marko Rauhamaa <marko at pacujo.net>
>> wrote:
>>> Dan Stromberg <drsalists at gmail.com>:
>>>
>>>> On Sun, Mar 9, 2014 at 2:32 PM, Marko Rauhamaa <marko at pacujo.net>
>>>> wrote:
>>>>> There is no O(1) hash table.
>>>>
>>>> http://stackoverflow.com/questions/2771368/can-hash-tables-really-be-
o1
>>>
>>> Please elaborate.
>>
>> A hash table of fixed size is O(1) until you fill it so full that you
>> start getting enough collisions to make it O(n), as one bucket becomes
>> a linked list.  This is because the hash function is O(1), and looking
>> up a value in a C array is O(1).
>>
>> A more flexible hash table will not have a fixed size; instead it will
>> grow itself as needed.  This growth operation tends to be O(n), but
>> happens vanishingly infrequently as the table grows, so it's still
>> amortized O(1).
> 
> A hash table can also only ever be O(1) in the average case.

Since this discussion has apparently devolved into people trying to out-
pedant each other, I'm going to dispute that. That will depend on the 
distribution of hash collisions. With a sufficiently big table, 
sufficiently small set of possible data, a sufficiently good hash 
function, or some combination of the above, a hash table may be O(1) even 
in the worst case.

E.g. if you hash a number n to itself, e.g. hash(42) == 42, your data 
consists of single byte numbers 0 through 255, and your hash table has 
256 slots, *there are no collisions* and every lookup is O(1).

There are technical meanings for Big Oh notation, which can be quite 
complicated. There's Big O, Little o, Big Ω (omega), Little ω (omega), 
Big Θ (theta), although I don't think there's a Little θ. They can refer 
to the best case, worst case, average (mean) case, median case, "typical" 
case where you're making some guess of what occurs typically, and any 
other situation you care about. The algorithmic complexity can apply 
consistently to each and every run of the algorithm, or they can be 
amortized over many runs. If you're doing a technical analysis of an 
algorithm, this pedantic detail is important.

But when people are just talking informally in a hand-wavy manner, they 
usually say "O(foo)" when they actually mean "for typical data under 
typical conditions, the algorithm runs with a complexity of the order of 
foo". And that's perfectly okay, just like it's perfectly okay to 
describe the Earth as a sphere even though a pedant will call it a 
wrinkly asymmetrical oblate spheroid.




-- 
Steven D'Aprano
http://import-that.dreamwidth.org/



More information about the Python-list mailing list