Ordering python sets

Sat Oct 25 17:53:10 EDT 2008

On Sat, 25 Oct 2008 09:21:05 +0000, Steven D'Aprano wrote:

> On Sat, 25 Oct 2008 08:58:18 +0000, Lie Ryan wrote:
> 
>> <anotherrandommusing>
>> Since python is dynamic language, I think it should be possible to do
>> something like this:
>> 
>> a = list([1, 2, 3, 4, 5], implementation = 'linkedlist') b = dict({'a':
>> 'A'}, implementation = 'binarytree') c = dict({'a': 'A'},
>> implementation = 'binarytree')
> 
> Oh I hope not. I think you have mistaken "dynamic" for "chaotic".
> 
> When I see a dict, I want to know that any two dicts work the same way.

Oh no, the two dict implementation would work _exactly_ the same from the 
outside, they are transparently interchangeable. Only the performance 
characteristic differs because of the different implementation. Actually 
I got this idea from a book about algorithm and data structure, that book 
said that an abstract data type (e.g. dict, set, list) have several 
competing implementations or data structures (e.g. binary tree dict, 
hashed dict, array dict). A data's implementation and interface can be 
separated so that we can switch the data structure's implementation 
without changing the rest of the code. The book is Algorithm Design 
Manual by Skiena.

hint: read the PS below

> I don't want to have to search the entire project's source code to find
> out if it is a dict implemented as a hash table with O(1) lookups, or a
> dict implemented as a binary tree with O(log N) lookups, or a dict
> implemented as a linear array with O(N) lookups.

No, you'd only need to look at the dict's creation point (or actually 
much better at projects docs, but not everyone create good docs). The 
alternative you mentioned below, by shadowing built-in, is both a hack 
and is much more likely to be missed.

> If I wanted that sort of nightmare, I can already do it by shadowing the
> builtin:
> 
> dict = binarytree
> D = dict({'a': 'A'})  # make a binary tree

I DON'T want THAT sort of nightmare you mentioned...
And it'd be impossible to have two dictionary that have two different 
implementations.

> There is no possible good that come from this suggestion. The beauty of
> Python is that the built-in data structures (list, dict, set) are
> powerful enough for 99% of uses[1], and for the other 1%, you can easily
> and explicitly use something else.

Oh really? As far as I know, python's list is extremely bad if you're 
inserting data at the beginning of the list (e.g. lst.insert(0) requires 
the whole array be "copied" one address away). This is because python's 
list uses array data structure, making addressing (e.g. a[2]) fast, but 
insertion slow. If, on the other hand, it is implemented using binary 
tree, it'd make insertion O(log n) but addressing is a bit tricky on it.

The keyword is "tradeoffs".

> But *explicitly* is the point. There's never any time where you can do
> this:

Yes, true, explicitly IS the point. How more explicit can you be than: 
dict({foo: bar}, implementation = 'binarytree')

> type(mydict) is dict

If my memory serves right, binary tree dict and hashed table dict is both 
a dict right? (Duck Typing)
Only their implementation differs. Implementation is... well, 
"implementation detail".

> and not know exactly what performance characteristics mydict will have.

Oh... why do I need to know what the performance characteristic of mydict 
is? Unless I know what I'm doing.

> (Unless you shadow dict or type, or otherwise do something that breaks
> the rules.) You never need to ask, "Okay, it's a dict. What sort of
> dict?"

Okay, it's a dict. What sort of dict? Who the hell cares? I don't need to 
know, they all looks and behave the same (Duck Typing)... at least until 
I profile them (since profiling is a deep black magic by itself, it 
cannot be used to discredit switching implementations). 

Sometimes we need a data type to use a specific data structure that have 
some specific performance characteristic, because we know we'll be doing 
a specific operation a lot more than other operations. 

If you actually need to know what the implementation is currently being 
used, you could implement a dict.implementation property.

> If you want a binary tree, ask for a binary tree.

Yeah, you ask for binary tree EXPLICITLY:
bintreedict = dict({a:b}, implementation = 'binarytree')

this:
regularhasheddict = dict({a:b})

would have a reasonable default.

PS: I do admit I have used the wrong terms in the last post. I used the 
term "data structure", instead it should be "abstract data type", "data 
structure" is a synonym for "implementation". In this post, I hope I've 
corrected all of the usage.