grab dict keys/values without iterating ?!

rusi rustompmody at gmail.com
Wed Dec 11 10:08:16 EST 2013


On Wednesday, December 11, 2013 8:16:12 PM UTC+5:30, Roy Smith wrote:
>  rusi  wrote:

> > The classic data structure for this is the trie:
> > General idea: http://en.wikipedia.org/wiki/Trie
> > In python:
> > http://stackoverflow.com/questions/11015320/how-to-create-a-trie-in-python/

> I agree that a trie fits the problem description well.  If I were coding 
> this up in C or C++, that's probably the data structure I'd use, with 
> each node containing an array of pointers to nodes, and the array 
> indexed by the ascii value of the next character (this doesn't translate 
> well to unicode).  Lookup speed would be awesomely fast.

> The problem is, that doesn't make a whole lot of sense in Python.  The 
> cited implementation uses dicts at each level.  By the time you've done 
> that, you might as well just throw all the data into one big dict and 
> use the full search string as the key.  It would be a lot less code, and 
> probably run faster.

> Still, as an exercise in learning about a useful (and under-appreciated) 
> data structure, implementing this as a trie sounds like a wonderful idea.

I was going to say that Steven's

data = {}
for c in "ABCDEFGHIJKLMNOPQRSTUVWXYZ":
    data[c] = {} 

is really the first step of the trie approach

except that instead of 

key = "Aardvark"
data[key[0]][key] = "some value"

he would need 
data[key[0]][key[1:] = "some value"

The catch is keys of length 1 eg "A"
Which makes the fully general (recursively unfolded) version easier
to write, though with all those zillions of dicts probably not very efficient.

Finitizing the trie into fixed small prefixes with leaves containing standard 
dicts looks sensible (to me without and testing of either efficiency or 
readability!)

The short prefix problem however remains

Of course there are other tricks possible:
If we know that the data is case-insensitive alphabetic ASCII* we can just 
normalize the data with 

def norm(s): return [ord(c.upper()) -ord('A') for c in s]

then
>>> norm("aBcD")
[0, 1, 2, 3]

and instead of dicts we can use 26-length lists

* O me O my! Ive used the terrible A-word!
Now RUN before the trolls get us!



More information about the Python-list mailing list