[Tutor] Dictionary [Jython/Red-Black Trees/bisect]

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Mon Nov 25 14:21:02 2002


> >However, if we really do want to have a sort of dictionary that does
> >return its keys() in sorted order, we may want to use something like a
> >"red-black" tree data structure.
> ...
> >    http://newcenturycomputers.net/projects/rbtree.html
>
> Aha, that was something new for me.


By the way, in the Java language, the concept of the "Map" maps pretty
closely to that of Python's dictionaries.  In fact, they're so close to
each other that, in the Java variant of Python --- Jython --- we can use
them fairly transparently:


###
###  Jython code
###
>>> import java
>>> d = java.util.HashMap()
>>> d
{}
>>> for (key, value) in [(1,1), (22,2), (333,3), (4444,4), (55555,5)]:
...     d[key] = value
...
>>> d
{333=3, 4444=4, 22=2, 55555=5, 1=1}
###



Java's "java.util.TreeMap" is an ordered map that uses red-black trees:

    http://java.sun.com/j2se/1.4/docs/api/java/util/TreeMap.html



If we use Jython, we have full access to this library:

###
>>> d2 = java.util.TreeMap()
>>> for (key, value) in [(1,1), (22,2), (333,3), (4444,4), (55555,5)]:
...     d2[key] = value
...
>>> d2
{1=1, 22=2, 333=3, 4444=4, 55555=5}
###

which is pretty neat, if not useless for the CPython users... *grin*





One other reason why these things are interesting is because they allow us
to do searches on coordinate ranges.  It's an easy thing to ask a
red-black tree to "Give me all the items where the key must be greater
than "bar", but less than "baz":


###
### Jython code
###
>>> words = open("/usr/share/dict/words").readlines()
>>> import java
>>> d = java.util.TreeMap()
>>> for w in words:
...     d[w.strip()] = w.strip()
...
>>> d2 = d.tailMap("bar").headMap("baz")
>>> words_between_bar_and_baz = d2.keySet()
>>> words_between_bar_and_baz
[bar, barb, barbarian, barbarians, barbaric, barbarism, barbarities,
barbarity, barbarous, barbarously, barbecue, barbecued, barbecues, barbed,
barbell, barbells, barber, barbital, barbiturate, barbiturates, barbs,
bard, bards, bare, bared, barefaced, barefoot, barefooted, barely,
bareness, barer, bares, barest, barflies, barfly, bargain, bargained,
bargaining, bargains, barge, barges, barging, baring, baritone, baritones,
barium, bark, barked, barker, barkers, barking, barks, barley, barn,
barns, barnstorm, barnstormed, barnstorming, barnstorms, barnyard,
barnyards, barometer, barometers, barometric, baron, baroness, baronial,
baronies, barons, barony, baroque, baroqueness, barrack, barracks,
barrage, barrages, barred, barrel, barrelled, barrelling, barrels, barren,
barrenness, barricade, barricades, barrier, barriers, barring, barringer,
barrow, bars, bartender, bartenders, barter, bartered, bartering, barters,
basal, basalt, base, baseball, baseballs, baseband, baseboard, baseboards,
based, baseless, baseline, baselines, basely, baseman, basement,
basements, baseness, baser, bases, bash, bashed, bashes, bashful,
bashfulness, bashing, basic, basically, basics, basil, basin, basing,
basins, basis, bask, basked, basket, basketball, basketballs, baskets,
basking, bass, basses, basset, bassinet, bassinets, bastard, bastards,
baste, basted, bastes, basting, bastion, bastions, bat, batch, batched,
batches, bath, bathe, bathed, bather, bathers, bathes, bathing, bathos,
bathrobe, bathrobes, bathroom, bathrooms, baths, bathtub, bathtubs, baton,
batons, bats, battalion, battalions, batted, batten, battens, batter,
battered, batteries, battering, batters, battery, batting, battle,
battled, battlefield, battlefields, battlefront, battlefronts,
battleground, battlegrounds, battlement, battlements, battler, battlers,
battles, battleship, battleships, battling, bauble, baubles, baud,
bauxite, bawdy, bawl, bawled, bawling, bawls, bay, bayed, baying, bayonet,
bayonets, bayou, bayous, bays]
###




This is not to say that this would be a particularly good way to solve a
problem like this: using standard Python lists and the 'bisect' module:

    http://www.python.org/doc/lib/module-bisect.html

would probably be a better idea for this particular problem.

###
>>> words = [w.strip() for w in open('/usr/share/dict/words').readlines()]
>>> words.sort()
>>> import bisect
>>> i, j = bisect.bisect_left(words, 'bar'), bisect.bisect_right(words, 'baz')
>>> words[i:j]
['bar', 'barb', 'barbarian', 'barbarians', 'barbaric', 'barbarism',
'barbarities', 'barbarity', 'barbarous', 'barbarously', 'barbecue',
'barbecued', 'barbecues', 'barbed', 'barbell', 'barbells', 'barber',
'barbital', 'barbiturate', 'barbiturates', 'barbs', 'bard', 'bards',
'bare', 'bared', 'barefaced', 'barefoot', 'barefooted', 'barely',
'bareness', 'barer', 'bares', 'barest', 'barflies', 'barfly', 'bargain',
'bargained', 'bargaining', 'bargains', 'barge', 'barges', 'barging',
'baring', 'baritone', 'baritones', 'barium', 'bark', 'barked', 'barker',
'barkers', 'barking', 'barks', 'barley', 'barn', 'barns', 'barnstorm',
'barnstormed', 'barnstorming', 'barnstorms', 'barnyard', 'barnyards',
'barometer', 'barometers', 'barometric', 'baron', 'baroness', 'baronial',
'baronies', 'barons', 'barony', 'baroque', 'baroqueness', 'barrack',
'barracks', 'barrage', 'barrages', 'barred', 'barrel', 'barrelled',
'barrelling', 'barrels', 'barren', 'barrenness', 'barricade',
'barricades', 'barrier', 'barriers', 'barring', 'barringer', 'barrow',
'bars', 'bartender', 'bartenders', 'barter', 'bartered', 'bartering',
'barters', 'basal', 'basalt', 'base', 'baseball', 'baseballs', 'baseband',
'baseboard', 'baseboards', 'based', 'baseless', 'baseline', 'baselines',
'basely', 'baseman', 'basement', 'basements', 'baseness', 'baser',
'bases', 'bash', 'bashed', 'bashes', 'bashful', 'bashfulness', 'bashing',
'basic', 'basically', 'basics', 'basil', 'basin', 'basing', 'basins',
'basis', 'bask', 'basked', 'basket', 'basketball', 'basketballs',
'baskets', 'basking', 'bass', 'basses', 'basset', 'bassinet', 'bassinets',
'bastard', 'bastards', 'baste', 'basted', 'bastes', 'basting', 'bastion',
'bastions', 'bat', 'batch', 'batched', 'batches', 'bath', 'bathe',
'bathed', 'bather', 'bathers', 'bathes', 'bathing', 'bathos', 'bathrobe',
'bathrobes', 'bathroom', 'bathrooms', 'baths', 'bathtub', 'bathtubs',
'baton', 'batons', 'bats', 'battalion', 'battalions', 'batted', 'batten',
'battens', 'batter', 'battered', 'batteries', 'battering', 'batters',
'battery', 'batting', 'battle', 'battled', 'battlefield', 'battlefields',
'battlefront', 'battlefronts', 'battleground', 'battlegrounds',
'battlement', 'battlements', 'battler', 'battlers', 'battles',
'battleship', 'battleships', 'battling', 'bauble', 'baubles', 'baud',
'bauxite', 'bawdy', 'bawl', 'bawled', 'bawling', 'bawls', 'bay', 'bayed',
'baying', 'bayonet', 'bayonets', 'bayou', 'bayous', 'bays']
###

But if our list of items is continuously shifting beneath our feet, it
might just be inconvenient or inefficient to keep calling sort() all the
time.  That's when red-black trees become useful: the red-black tree will
keep itself sorted.


Good luck!