Help: Creating condensed expressions

Eddie Corns eddie at holyrood.ed.ac.uk
Fri Mar 24 15:20:18 EST 2006


David Hirschfield <davidh at ilm.com> writes:

>Here's the problem: Given a list of item names like:

>apple1
>apple2
>apple3_SD
>formA
>formB
>formC
>kla_MM
>kla_MB
>kca_MM

>which is a subset of a much larger list of items,
>is there an efficient algorithm to create condensed forms that match 
>those items, and only those items? Such as:

>apple[12]
>apple3_SD
>form[ABC]
>kla_M[MB]
>kca_MM

>The condensed expression syntax only has [...] and * as operators. [...] 
>matches a set of individual characters, * matches any string.
>I'd be satisfied with a solution that only uses the [...] syntax, since 
>I don't think it's possible to use * without potentially matching items 
>not explicitly in the given list.

>I'm not sure what this condensed expression syntax is called (looks a 
>bit like shell name expansion syntax), and I'm not even sure there is an 
>efficient way to do what I'm asking. Any ideas would be appreciated.

If you used regular expressions instead of globs then I think what you want is
to optimize the regular expression for:

 'apple1|apple2|apple3_SD|formA|formB|formC|kla_MM|kla_MB|kca_MM'

then spit the optimised finite automaton back out.  Of course if you're just
searching Python might do a decent job of optimising it anyway.

Looking at:

  http://en.wikipedia.org/wiki/Finite_state_machine#Optimization

they make it sound so easy!.  There's also a whole bunch of tools on that
page.  Maybe there's an optimiser you can use in one of them.

Failing that I guess you build a tree and try to merge nodes where they fork.
I suspect an optimal solution would be hard but if you knew your input did
have lots of redundancy a simple algorithm might work.

Or I could be talking crap as usual.

Eddie



More information about the Python-list mailing list