grab dict keys/values without iterating ?!
rusi
rustompmody at gmail.com
Wed Dec 11 10:08:16 EST 2013
On Wednesday, December 11, 2013 8:16:12 PM UTC+5:30, Roy Smith wrote:
> rusi wrote:
> > The classic data structure for this is the trie:
> > General idea: http://en.wikipedia.org/wiki/Trie
> > In python:
> > http://stackoverflow.com/questions/11015320/how-to-create-a-trie-in-python/
> I agree that a trie fits the problem description well. If I were coding
> this up in C or C++, that's probably the data structure I'd use, with
> each node containing an array of pointers to nodes, and the array
> indexed by the ascii value of the next character (this doesn't translate
> well to unicode). Lookup speed would be awesomely fast.
> The problem is, that doesn't make a whole lot of sense in Python. The
> cited implementation uses dicts at each level. By the time you've done
> that, you might as well just throw all the data into one big dict and
> use the full search string as the key. It would be a lot less code, and
> probably run faster.
> Still, as an exercise in learning about a useful (and under-appreciated)
> data structure, implementing this as a trie sounds like a wonderful idea.
I was going to say that Steven's
data = {}
for c in "ABCDEFGHIJKLMNOPQRSTUVWXYZ":
data[c] = {}
is really the first step of the trie approach
except that instead of
key = "Aardvark"
data[key[0]][key] = "some value"
he would need
data[key[0]][key[1:] = "some value"
The catch is keys of length 1 eg "A"
Which makes the fully general (recursively unfolded) version easier
to write, though with all those zillions of dicts probably not very efficient.
Finitizing the trie into fixed small prefixes with leaves containing standard
dicts looks sensible (to me without and testing of either efficiency or
readability!)
The short prefix problem however remains
Of course there are other tricks possible:
If we know that the data is case-insensitive alphabetic ASCII* we can just
normalize the data with
def norm(s): return [ord(c.upper()) -ord('A') for c in s]
then
>>> norm("aBcD")
[0, 1, 2, 3]
and instead of dicts we can use 26-length lists
* O me O my! Ive used the terrible A-word!
Now RUN before the trolls get us!
More information about the Python-list
mailing list