Help: Creating condensed expressions
Eddie Corns
eddie at holyrood.ed.ac.uk
Fri Mar 24 15:20:18 EST 2006
David Hirschfield <davidh at ilm.com> writes:
>Here's the problem: Given a list of item names like:
>apple1
>apple2
>apple3_SD
>formA
>formB
>formC
>kla_MM
>kla_MB
>kca_MM
>which is a subset of a much larger list of items,
>is there an efficient algorithm to create condensed forms that match
>those items, and only those items? Such as:
>apple[12]
>apple3_SD
>form[ABC]
>kla_M[MB]
>kca_MM
>The condensed expression syntax only has [...] and * as operators. [...]
>matches a set of individual characters, * matches any string.
>I'd be satisfied with a solution that only uses the [...] syntax, since
>I don't think it's possible to use * without potentially matching items
>not explicitly in the given list.
>I'm not sure what this condensed expression syntax is called (looks a
>bit like shell name expansion syntax), and I'm not even sure there is an
>efficient way to do what I'm asking. Any ideas would be appreciated.
If you used regular expressions instead of globs then I think what you want is
to optimize the regular expression for:
'apple1|apple2|apple3_SD|formA|formB|formC|kla_MM|kla_MB|kca_MM'
then spit the optimised finite automaton back out. Of course if you're just
searching Python might do a decent job of optimising it anyway.
Looking at:
http://en.wikipedia.org/wiki/Finite_state_machine#Optimization
they make it sound so easy!. There's also a whole bunch of tools on that
page. Maybe there's an optimiser you can use in one of them.
Failing that I guess you build a tree and try to merge nodes where they fork.
I suspect an optimal solution would be hard but if you knew your input did
have lots of redundancy a simple algorithm might work.
Or I could be talking crap as usual.
Eddie
More information about the Python-list
mailing list