[Python-ideas] Trie ABC?

Geoffrey Sneddon foolistbar at googlemail.com
Sun May 5 22:03:13 CEST 2013


Currently there are a large number of trie implementations for Python, 
all with slightly different APIs. It would be nice to introduce a ABC 
for Tries to attempt to unify these.

Why do people want tries in Python? Typically because they provide quick 
lookup of prefixes and keys starting with a prefix. html5lib uses a 
pluggable trie based around a pseudo-ABC for tokenizing entities, for 
example (see below for links).

It would likely make sense to introduce Trie and MutableTrie ABCs 
inheriting from Mapping and (MutableMapping, Trie) respectively.

It is suggested to add at least two mixins:

longest_prefix(prefix) returning the longest prefix of "prefix" that is 
a member of the Trie.

keys_with_prefix(prefix) (or maybe keys_starting_with, but I'm 
bike-shedding myself now!) returning an iterable of keys that start with 
with prefix. Some implementations simply override keys for this, adding 
an optional first argument of prefix.

Many implementations also include something like has_keys_with_prefix 
returning len(keys_with_prefix(prefix)) > 0.

A selection of existing trie implementations for Python:

https://github.com/kmike/datrie
https://github.com/buriy/python-chartrie
https://github.com/kmike/hat-trie
https://github.com/dhain/trie

I can attempt to summarize the various APIs these provide if there is 
any interest whatsoever (it seems like there's little use if it's 
decided against on principle!).

It would be nice to get an implementation into the stdlib if people are 
in favour of an ABC, but given there's no real singular solution that 
has been proven in the field I'm not rushing to force this. 
<http://bugs.python.org/issue9520> was a case of trying to get a trie 
implementation into the stdlib.

There's something vaguely along the lines of the above ABC that I 
implemented for html5lib in 
<https://github.com/html5lib/html5lib-python/blob/master/html5lib/trie/_base.py>, 
as well as a light-weight pure-Python implementation in 
<https://github.com/html5lib/html5lib-python/blob/master/html5lib/trie/py.py>.


/gsnedders



More information about the Python-ideas mailing list