Why are there no ordered dictionaries?
Bengt Richter
bokr at oz.net
Tue Nov 29 14:08:00 EST 2005
On Sun, 27 Nov 2005 12:00:23 +0100, Christoph Zwerschke <cito at online.de> wrote:
>Bengt Richter wrote:
>
>>>d.keys[:] = newkeyseq
>>
>> Do you really mean just re-ordering the keys without a corresponding reording of values??
>> That would be a weird renaming of all values. Or do you means that any key should still
>> retrieve the same value as before if used as d[key]? In which case the values must undergo
>> the same permutation as the keys. I.e., you are assuming key->value pairings remain stable
>> through any key reorderings?
>
>Since it is considered as being a dictionary in the first place, the
>key->value pairings should of course stay stable. In the usual
>implementation based on an ordinary dictionary with an additional key
>list ("sequence" in the Foord/Larosa and "_keys" in the Bejamin/Winter
>implementation), you would only set the key list, since the value list
>is generated dynamically. But if your implementation keeps internal
>values or items lists, these need to be adjusted as well.
>
>I will assume that d has is a Foord/Larosa ordered dict with "sequence"
>attribute in the following.
>
>Then, with other words,
>
>d.keys[:] = newkeyseq
>
>should do the same as:
>
>d.sequence = newkeyseq
>
>> Exactly what, though? should e.g.
>> d.keys[3] = newk3
>> mean (not a suggested implementation, just to define semantics)
>> keys = d.keys()
>> if newk3 in keys and keys.index(newk3)!=3:
>> raise ValueError,'Attempt to introduce duplicate key'
>> items = d.items()
>> items[3] = (newk3, items[3][1])
>> d.clear()
>> d.update(items)
>
>Yes, that would be the correct semantics. Of course this should not be
>the real implementation and use KeyError instead of ValueError. With
>other words,
>
>d.keys[i] = newkey
>
>sould be the same as:
>
>if d.sequence[i] != newkey:
> if newkey in d.sequence:
> raise KeyError,'Attempt to introduce duplicate key'
> else:
> d.sequence[i] = newkey
>
>> This would allow what you might call renaming in place.
>> Similarly
>> d.keys[i:j] = newkeysij
>> might have the semantics of
>> keys = d.keys()
>> outside = set(keys[:i])+set(keys[j:])
>> if outside & set(newkeysij) or len(newkeysij) != len(set(newkeysij)):
>> raise ValueError,'Attempt to introduce duplicate key(s)'
>> items = d.items()
>> items[i:j] = [(k, items[kx+i][1]) for kx,k in enumerate(newkeysij)]
>> d.clear()
>> d.update(items)
>>
>> Is this what is desired?
>
>Not quite, because it does not preserve the key->value pairings (see
>above) and it would behave strangely or raise an exception if the new
>slice is larger. The following code would do:
>
>keys = d.keys()
>outside = set(keys[:i])|set(keys[j:])
>if outside & set(newkeysij) or len(newkeysij) != len(set(newkeysij)):
> raise ValueError,'Attempt to introduce duplicate key(s)'
>items = d.items()
>items[i:j] = [(k, d.get(k, None)) for k in newkeysij]
>d.clear()
>d.update(items)
>
>(Note that there was a bug in the second line. You cannot add sets.)
Oops, thinking about adding dicts ;-)
>
>Again, this would be equivalent to:
>
>seq = d.sequence
>newseq = seq[:]
>newseq[i:j] = newkeysij
>if len(newseq) != len(set(newseq)):
> raise KeyError,'Attempt to introduce duplicate key(s)'
>for k in set(seq[i:j]) - set(newkeysij):
> del d[k]
>for k in set(newkeysij) - set(seq[i:j]):
> d[k] = None
>d.sequence = newseq
>
>>>You don't keep track of the item lists, they need to be built on every
>>>occasion.
>>
>> That depends on how you implement ;-)
>
>Ok, I was thinking of the usual implementations.
>
>> Back from holiday, so maybe I'll hack something out.
>
>Let us know when you have something to check out.
>
>Maybe Fuzzyman can make some moderate improvements to the existing
>odict.py, and you can do something more "radical". Then we have two
>"reference implementations" and can compare how they prove regarding
>performance and usability.
>
My code so far is a kludge to get functionality. Perhaps we can arrive at
a spec by doing some TDD. My current kludge passes these tests
(run by py.test which I downloaded from the pypy project site and made
work (apparanently ;-) with IIRC a .pth file to point to py where I
expanded the zip, and a cmd file to kick of python running py.test,
and I think that was all there was. As you can see, it is super easy to define
tests. If you want to try it, I think I can retrace my steps and describe
the way I set it up (for windows NT) in a few turgid run-on paragraphs ;-)
Maybe I'll get around to writing a downloading/checking/installing script
for it, with a few interactive are-you-sure?'s and file tree placement options etc.
I'm calling it a Creordict for now -- Created-order-dict.
Here are the tests so far. Do you want to add some to refine what's supposed to
happen. You may want to look at test_items, test_keys, test_values
as those are the ones that provide d.items(), d.items[i], d.items[i:j]
style accesses, with assign, eval, and del available for the latter two.
also test___getitem__ for d[i] => value normal key access vs
d[i:j] => another Creordict instance. Let me know what's missing.
I'm not sure how this list-based dict object will work out, but it's
a subclass of list, not of dict. ;-)
Some def test_usecase_xx(): ... definitions might be nice.
I am running py.test on windows NT. I din't compile anything,
I just cobbled afew things together.
----< test_creordict.py >----------------------------------------------
# test_creordict.py
# XXX assoclist.py ?? cf. scheme assv using == (eq<->is, eqv<->==, equal<->deep== ???)
# Tests for creordict.py -- a dictionary object which keeps items in creation time order.
#
# XXX boilerplate bokr AT oz DOT net
from py.test import raises
import py
from creordict import Creordict
def test_sanity():
d = Creordict()
assert d.keys() == []
assert d.values() == []
assert d.items() == []
assert d == d
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
assert repr(d) == "Creordict([('a', 0), ('b', 100), ('c', 200), ('d', 300), ('e', 400)])"
assert d.keys() == ['a', 'b', 'c', 'd', 'e']
assert d.values() == [0, 100, 200, 300, 400]
assert d.items() == [('a', 0), ('b', 100), ('c', 200), ('d', 300), ('e', 400)]
def test___init__():
d = Creordict()
assert d.keys() == []
assert d.values() == []
assert d.items() == []
assert d == d
assert d._index == {}
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
assert repr(d) == "Creordict(%r)"% d.items()
assert d.keys() == ['a', 'b', 'c', 'd', 'e']
assert d.values() == [0, 100, 200, 300, 400]
assert d.items() == [('a', 0), ('b', 100), ('c', 200), ('d', 300), ('e', 400)]
assert d._index == {'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3}
def test___contains__():
d = Creordict([('a', 1)])
assert 'a' in d
assert (4 in d) == False
def test___eq__():
d = Creordict([('a', 1)])
assert d == d
raises(TypeError, "d == {}")
assert d != Creordict([('b', 1)])
def test___cmp__():
d = Creordict([('a', 1)])
assert cmp(d, d) == 0
raises(TypeError, "cmp(d, {})")
def mkresult(s, **kw):
return Creordict((k, kw.get(k, k in 'abcdef' and 'abcdef'.index(k)*100)) for k in s)
def test___delitem__():
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
print mkresult('abcde')
assert d == mkresult('abcde')
ditems = d.items()
print d['b']
print d.items()
print d._index
print d._index['b']
del d['b']
assert d == mkresult('acde')
del d['a']
assert d == mkresult('cde')
del d['e']
assert d == mkresult('cd')
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
del d[1:3]
assert d == mkresult('ade')
del d[:1]
assert d == mkresult('de')
del d[-1:]
assert d == mkresult('d')
def test___repr__():
assert repr(Creordict()) == 'Creordict([])'
raises(TypeError, "Creordict({1: 1})")
d = Creordict(((1, 3), (3, 2), (2, 1)))
assert repr(d) =='Creordict([(1, 3), (3, 2), (2, 1)])'
def test___setitem__():
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
print mkresult('abcde')
assert d == mkresult('abcde')
d['b'] = 101
assert d == mkresult('abcde', b=101)
d['e'] = 401
assert d == mkresult('abcde', b=101, e=401)
def test___getitem__():
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
assert d == mkresult('abcde')
items = d.items()
for k,v in items:
assert d[k]==v
raises(KeyError, "d['f']")
def test___getslice__():
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
assert d[2:4] == mkresult('cd')
assert d[ :4] == mkresult('abcd')
assert d[2: ] == mkresult('cde')
assert d[ : ] == mkresult('abcde')
assert d[0:0] == mkresult('')
def test___add__():
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
assert d+d == d
assert d + Creordict([('f', 500)]) == mkresult('abcdef')
def test_reverse():
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
d.reverse()
assert d == mkresult('edcba')
def test_insert():
d = Creordict([(k,i*100) for i,k in enumerate('abc')])
d.insert(1, ('x', 101))
assert d == mkresult('axbc', x=101, b=100, c=200)
def test_get():
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
assert d == mkresult('abcde')
items = d.items()
for k,v in items:
assert d.get(k)==v
assert d.get(k+'1') is None
assert d.get(k+'1', v) == v
raises(KeyError, "d['f']")
def test_copy():
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
assert d == d.copy()
def test_items():
assert Creordict().items() == []
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
assert d.items() == [(k,i*100) for i,k in enumerate('abcde')]
assert d.items[:] == [(k,i*100) for i,k in enumerate('abcde')]
assert d.items[2:4] == [(k,i*100) for i,k in enumerate('abcde')][2:4]
d.items[2:4] = []
assert d == mkresult('abe')
d.items[1] = ('x', 'replaces b')
assert d == mkresult('axe', x='replaces b')
def test_keys():
assert Creordict().keys() == []
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
assert d.keys() == list('abcde')
assert d.keys[:] == list('abcde')
assert d.keys[2:4] == list('cd')
assert d.keys[1] == 'b'
assert d.keys[-1] == 'e'
d.keys[2:4] = ['x', 'y']
assert d == mkresult('abxye', x=200, y=300)
del d.keys[-1]
assert d == mkresult('abxy', x=200, y=300)
keys = d.keys()
keys[0], keys[-1] = keys[-1], keys[0] # swap end keys
d.keys[:]=keys
assert d == mkresult('ybxa', x=200, y=0, a=300) # a and y value associations swapped ;-)
def test_values():
assert Creordict().values() == []
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
assert d.values() == [i*100 for i,k in enumerate('abcde')]
assert d.values[:] == [i*100 for i,k in enumerate('abcde')]
assert d.values[2:4] == d[2:4].values()
assert d.values[1] == d.values()[1]
assert d.values[-1] == d.values()[-1]
d.values[2:4] = ['v2', 'v3']
assert d == mkresult('abcde', c='v2', d='v3')
del d.values[-1]
assert d == mkresult('abcd', c='v2', d='v3')
values = d.values()
values[0], values[-1] = values[-1], values[0] # swap end values
d.values[:]=values
assert d == mkresult('abcd', c='v2', d=0, a='v3') # a and y value associations swapped ;-)
def test_iteritems():
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
itd = d.iteritems()
assert type(itd) == type(iter([]))
assert list(itd) == d.items()
def test_len():
d = Creordict()
assert len(d) == 0
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
assert len(d) == 5
def test_has_key():
d = Creordict(((1, 3), (3, 2), (2, 1)))
assert d.has_key(1) is True
assert d.has_key(4) is False
def test_iterkeys():
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
itd = d.iterkeys()
assert type(itd) == type(x for x in [])
assert list(itd) == d.keys()
def test_itervalues():
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
itd = d.itervalues()
assert type(itd) == type(x for x in [])
assert list(itd) == d.values()
def test_clear():
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
d.clear()
assert d.items() == []
assert d.keys() == []
assert d.values() == []
def test_pop():
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
assert d.pop('e') == 400
assert d.pop('b') == 100
assert d.pop('a') == 0
assert d == mkresult('cd')
def test_popitem():
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
assert d.popitem() == ('e', 400)
assert d.popitem() == ('d', 300)
assert d.popitem() == ('c', 200)
assert d == mkresult('ab')
def test_setdefault():
d = Creordict([(k,i*100) for i,k in enumerate('abcde')])
assert d.setdefault('c', 'cee') == 200
assert d.setdefault('g') is None
assert d == mkresult('abcdeg', g=None)
assert d.setdefault('h', 'eightch') == 'eightch'
assert d == mkresult('abcdegh', g=None, h='eightch')
def test_update():
d = Creordict()
assert d == mkresult('')
d.update([(k,i*100) for i,k in enumerate('abcde')])
assert d == mkresult('abcde')
raises(TypeError, "d.update({'f': 500})")
d.update([('f', 500), ('b',101)])
assert d == mkresult('abcdef', b=101)
raises(ValueError, "d.update([('broken',)])")
-----------------------------------------------------------------------
Regards,
Bengt Richter
More information about the Python-list
mailing list