Regular expression as dictionary key?
Andrew Dalke
dalke at dalkescientific.com
Mon Dec 3 12:44:08 EST 2001
Luke:
>Let's say I'm given a singular form of a word and want to match it with
>plurals still, but words["dog"] will obviously raise a KeyError.
>
>Without doing a linear substring search on every element in words.key(),
>is there a way to take advantage of the dict's binary tree layout
>properties (e.g. speed) like:
>
>words["dog*"] # where dog* is a Regex
There's two issues here
1) supporting searches for a word given either the singular
or plural form
2) allowing a regular expressions as the dictionary key.
The first is the need, the second is the implementation. But
that implementation won't work, since in English the plural
of a given word cannot be determined using a simple regular
expression pattern
cat -> cats
dress -> dresses
sky -> skies
mouse -> mice
sheep -> sheep
fish -> fish (if more than one fish of the same species)
fish -> fishes (if multiple species are present)
The only viable regular expression pattern is to use 'X|Y'.
But if you're going to do that, the easiest solution to
understand and optimize is one which creates a new
dictionary-like object that implements singular/plural
lookups directly, like:
def guess_a_plural(s):
if s[-1:] in "szx":
return s + "es"
if s[-1:] == "y":
return s[:-1] + "ies"
if s[-1:] == "f":
return s[:-1] + "ves"
return s + "s"
class PluralDict:
def __init__(self):
self.singular_data = {}
self.plural_data = {}
def __setitem__(self, key, value):
if isinstance(key, type( () )):
singular, plural = key
else:
singular = key
plural = guess_a_plural(key)
self.singular_data[singular] = value
self.plural_data[plural] = value
def __getitem__(self, key):
if self.singular_data.has_key(key):
return self.singular_data[key]
return self.plural_data[key]
Assuming this code works, it can be used like this example,
which takes the english name of some common animals and
tells you the Latin/scientific name
d = PluralDict()
d["cat"] = "Felix domestica"
d[ ("mouse", "mice") ] = "Mus musculus"
d["fox"] = "Vulpes fulva"
d["wolf"] = "Canis lupis"
>>> print d["cats"]
Felix domestica
>>> print d["wolves"]
Canis lupis
>>> print d["mice"]
Mus musculus
>>>
I suspect though that you really just want a way to use a
regular expression as a key, and created this singular/plural
example as justification.
Andrew
dalke at dalkescientific.com
More information about the Python-list
mailing list