[Tutor] For - if - else loop; print selective output

Oscar Benjamin oscar.j.benjamin at gmail.com
Fri Oct 26 01:57:07 CEST 2012


On 26 October 2012 00:08, Steven D'Aprano <steve at pearwood.info> wrote:
> On 26/10/12 07:16, eryksun wrote:
>>
>> On Thu, Oct 25, 2012 at 3:46 PM, Prasad, Ramit
>> <ramit.prasad at jpmorgan.com>  wrote:
>>>
>>>
>>> Do you happen to know offhand if there is a difference between
>>> `in<list>` vs. `in<tuple>` vs. `in<set>`?
>>
>>
>> The "in" comparison (__contains__ method) is equivalent for list and
>> tuple. It has to search through the sequence item by item, which makes
>> it an O(n) operation. On the other hand, a set/dict uses the hash() of
>> an object to map it to a known location in a table (performance
>> degrades if there are many collisions). On average, you can check if a
>> set/dict contains an item in constant time, i.e. O(1). The amortized
>> worst case is O(n).
>
>
> To be precise: dict lookup, deletion and insertion are amortized O(1)
> (on average, constant time) assuming there are no hash collisions.
>
> When there are collisions, they are O(k) where k is the number of
> colliding items. So in the worst case where k=n (the number of items
> in the dict), dicts perform a little worse than lists, but in the
> typical case, they are usually much, much faster.

The use of big-O notation above seems strange to me. The average value
of k is (assuming non-pathological inputs) a constant that is
independent of n (for large n). So O(k) really means average case
O(1).

> "Amortized" strictly only refers to deletion and insertion. Since
> lookups don't modify the dict, a lookup always takes the same time.
> But deletions and insertions will occasionally trigger a resize of
> the dict, which obviously takes a lot longer than the non-resizing
> cases. But if you spread the average cost of the resize over many
> insertions or deletions, you get amortized O(1) time.

You mean that the cost is only considered to be "amortised" when
averaging over the cost of the qualitatively distinct operation of
resizing the dict among the cost of the non-resizing insertions (does
deletion ever trigger a resize?). When averaging over the cases where
there are more or less hash collisions but no possibility of resizes
we refer to the cost as "average case" but not "amortised". Is that
what you mean?


Oscar


More information about the Tutor mailing list