Regular expression as dictionary key?

Andrew Dalke dalke at dalkescientific.com
Mon Dec 3 12:44:08 EST 2001


Luke:
>Let's say I'm given a singular form of a word and want to match it with
>plurals still, but words["dog"] will obviously raise a KeyError.
>
>Without doing a linear substring search on every element in words.key(),
>is there a way to take advantage of the dict's binary tree layout
>properties (e.g. speed) like:
>
>words["dog*"]   # where dog* is a Regex

There's two issues here
  1) supporting searches for a word given either the singular
or plural form
  2) allowing a regular expressions as the dictionary key.

The first is the need, the second is the implementation.  But
that implementation won't work, since in English the plural
of a given word cannot be determined using a simple regular
expression pattern

  cat -> cats
  dress -> dresses
  sky -> skies
  mouse -> mice
  sheep -> sheep
  fish -> fish (if more than one fish of the same species)
  fish -> fishes (if multiple species are present)

The only viable regular expression pattern is to use 'X|Y'.
But if you're going to do that, the easiest solution to
understand and optimize is one which creates a new
dictionary-like object that implements singular/plural
lookups directly, like:

def guess_a_plural(s):
  if s[-1:] in "szx":
    return s + "es"
  if s[-1:] == "y":
    return s[:-1] + "ies"
  if s[-1:] == "f":
    return s[:-1] + "ves"
  return s + "s"

class PluralDict:
  def __init__(self):
    self.singular_data = {}
    self.plural_data = {}
  def __setitem__(self, key, value):
    if isinstance(key, type( () )):
      singular, plural = key
    else:
      singular = key
      plural = guess_a_plural(key)
    self.singular_data[singular] = value
    self.plural_data[plural] = value
  def __getitem__(self, key):
    if self.singular_data.has_key(key):
      return self.singular_data[key]
    return self.plural_data[key]

Assuming this code works, it can be used like this example,
which takes the english name of some common animals and
tells you the Latin/scientific name

d = PluralDict()
d["cat"] = "Felix domestica"
d[ ("mouse", "mice") ] = "Mus musculus"
d["fox"] = "Vulpes fulva"
d["wolf"] = "Canis lupis"

>>> print d["cats"]
Felix domestica
>>> print d["wolves"]
Canis lupis
>>> print d["mice"]
Mus musculus
>>>

I suspect though that you really just want a way to use a
regular expression as a key, and created this singular/plural
example as justification.

                    Andrew
                    dalke at dalkescientific.com






More information about the Python-list mailing list