Mapping, with sequence as key, wildcard and subsequence matching

Steven D'Aprano steve at pearwood.info
Thu Jul 16 00:01:15 EDT 2015


On Thu, 16 Jul 2015 11:51 am, Ben Finney wrote:

> Howdy all,
> 
> What well-defined data type exists with the following properties:
> 
> * Mapping, key → value.
> 
> * Each key is a sequence (e.g. `tuple`) of items such as text strings.
> 
> * Items in a key may be the sentinel `ANY` value, which will match any
>   value at that position.
> 
> * A key may specify that it will match *only* sequences of the same
>   length.
> 
> * A key may specify that it will match sequences with arbitrarily many
>   additional unspecified items.


Sounds like a regular expression. Remember that computer science theoretical
regular expressions don't just match strings, they can apply to any
sequence of primitive values.

In your case, you only need two special tokens, match-one and
match-one-or-more (which may be like r'.' and r'.*').

If you can guarantee that the key items have no spaces in them, you may even
be able to use a string re. Assemble the key from the tuple like so:

- replace ONE by "."
- replace ONE-OR-MORE by ".*"
- return " ".join(keyitems)


Otherwise, you can create your own sequence type and simple regex engine
(you only need two meta-items, ONE and ONE-OR-MORE) to perform equality
tests.

If you can use a string, make it a subclass with an __eq__ method that
returns re.match(key, self). Either way, you can then use your custom class
as the keys in a mapping type.

You can't use a dict for the mapping, not unless you're smarter than me, due
to the requirement to hash the keys.

You can use a simple list of tuples [(key, value), ...] if O(N) searches are
fast enough. (Surely they'll be fast enough for a proof of concept?) That
at least is guaranteed to work: when doing a lookup, you simply look at
every key until you find (at least one) that matches. If there are no
matches by the time you hit the end of the list, you know that there cannot
be any matches!

A more complex, but potentially faster for large N, method would be to keep
the list sorted and use a binary search (bisect module). Or use a binary
tree. The hard part is that for this to work, you need a well-defined
less-than operation as well as equality, and I'm not sure how to implement
that for ONE and ONE-OR-MORE. I'm sure it can be done though.

At a guess... make ONE less than every string, and every string less than
ONE-OR-MORE, and then just use regular tuple order comparisons.


-- 
Steven




More information about the Python-list mailing list