save dictionary to a file without brackets.

Dave Angel d at davea.name
Thu Aug 9 19:38:13 EDT 2012


On 08/09/2012 06:54 PM, Andrew Cooper wrote:
> On 09/08/2012 23:26, Dave Angel wrote:
>> On 08/09/2012 06:03 PM, Andrew Cooper wrote:
>>> On 09/08/2012 22:34, Roman Vashkevich wrote:
>>>> Actually, they are different.
>>>> Put a dict.{iter}items() in an O(k^N) algorithm and make it a hundred thousand entries, and you will feel the difference.
>>>> Dict uses hashing to get a value from the dict and this is why it's O(1).
>>>>
>>> Sligtly off topic, but looking up a value in a dictionary is actually
>>> O(n) for all other entries in the dict which suffer a hash collision
>>> with the searched entry.
>>>
>>> True, a sensible choice of hash function will reduce n to 1 in common
>>> cases, but it becomes an important consideration for larger datasets.
>>>
>>> ~Andrew
>> I'm glad you're wrong for CPython's dictionaries.  The only time the
>> lookup would degenerate to O[n] would be if the hash table had only one
>> slot.  CPython sensibly increases the hash table size when it becomes
>> too small for efficiency.
>>
>>
>> Where have you seen dictionaries so poorly implemented?
>>
> Different n, which I should have made more clear.  I was using it for
> consistency with O() notation.  My statement was O(n) where n is the
> number of hash collisions.
That's a little like doing a survey, and reporting the results as
showing that 100% of the women hit their husbands, among the population
of women who hit their husbands.

In your original message, you already stated the assumption that a
proper hash algorithm would be chosen, then went on to apparently claim
that large datasets would still have an order n problem.  That last is
what I was challenging.

The rest of your message here refers to client code, not the system.
> The choice of hash algorithm (or several depending on the
> implementation) should specifically be chosen to reduce collisions to
> aid in efficient space utilisation and lookup times, but any
> implementation must allow for collisions.  There are certainly runtime
> methods of improving efficiency using amortized operations.
>
> As for poor implementations,
>
> class Foo(object):
>
>     ...
>
>     def __hash__(self):
>         return 0
>
> I seriously found that in some older code I had the misfortune of
> reading.  It didn't remain in that state for long.
>
> ~Andrew


-- 

DaveA




More information about the Python-list mailing list