AST Manipulation and Common Subexpression Elimination
Heiko Wundram
heikowu at ceosg.de
Sat Oct 18 08:44:10 EDT 2003
Martin v. Löwis wrote:
> Robert Kern wrote:
>
>> 1. Can I do better than brute-force traversing the AST and trying to
>> match each
>> subtree to the other subtrees (using something like the match()
>> function given
>> in the parser module example)?
>
>
> Certainly: you could do hashing or sorting. For hashing, define some
> hash function for expressions, preferably recursively
Coolest way to do this (IMHO) would be the application of something like
the LZW compression algorithm.
Hmm... Guess I'm still too tired to explain this morning, so I'll just
have to give an example:
Expression: x = (n1+2)*(n1-1)*(n3+2)*(n1-1)*(n1+2)*(n3+2)*(n1-3)
Step 1
dummy1 = (n1+2)
x = dummy1
First Term isn't known yet, so assign it to dummy1 and stop processing.
Continue step two at second term.
Step 2
dummy1 = (n1+2)
dummy2 = (n1-1)
x = dummy1*dummy2
Second term isn't known yet, so assign to dummy2 and stop processing.
Continue step three at third term.
Step 3
dummy1 = (n1+2)
dummy2 = (n1-1)
dummy3 = (n3+2)
x = dummy1*dummy2*dummy3
Third term isn't known yet, so assign to dummy3 and stop processing.
Continue step four at fourth term.
Step 4
dummy1 = (n1+2)
dummy2 = (n1-1)
dummy3 = (n3+2)
dummy4 = dummy1*dummy2
dummy5 = dummy1*dummy3
dummy6 = dummy2*dummy3
dummy7 = dummy3*dummy4 # Remember that this is dummy1*dummy2*dummy3
x = dummy1*dummy2*dummy3*dummy2*dummy1*dummy3
Fourth term is known already in a "simple" dummy (dummy2). Now continue
to match "simple dummies" until you find a term that isn't already
present (n1-3). Repush the non-match onto the factor stack. This makes
the longest matching sequence dummy2*dummy1*dummy3. Write down all
possible combinations of at least length 2 of the found terms to dummy
variables (regardless of order). In these dummies, replace all
occurences of "complex dummies" with references to earlier "complex
dummies" (dummy7 = dummy1*dummy2*dummy3, when you start at the
beginning, this makes dummy3*dummy4). Continue step five at seventh term.
Step 5
dummy1 = (n1+2)
dummy2 = (n1-1)
dummy3 = (n3+2)
dummy4 = dummy1*dummy2
dummy5 = dummy1*dummy3
dummy6 = dummy2*dummy3
dummy7 = dummy3*dummy4
dummy8 = (n1-3)
x = dummy1*dummy2*dummy3*dummy2*dummy1*dummy3*dummy8
Seventh term isn't know yet, so assign to dummy8 and stop processing. As
this is the last term, stop compression and continue with cleanup.
Step 6
dummy1 = (n1+2)
dummy2 = (n1-1)
dummy3 = (n3+2)
dummy4 = dummy1*dummy2
dummy5 = dummy1*dummy3
dummy6 = dummy2*dummy3
dummy7 = dummy3*dummy4 # This is dummy1*dummy2*dummy3
dummy8 = (n1-3)
x = dummy7*dummy7*dummy8
Apply same algorithm as with step 4, replacing all occurences of
"complex dummies" found in x with references to other "complex dummies",
starting to match at the beginning.
Step 8
Finally Remove all terms that aren't directly or indirectly referenced
from x. (Remove dummy5 and dummy6).
This algorithm should create longest matching subsequences. If I
understand correctly, you want to deal with more than just
multiplication, but I guess you could modify this algorithm so that it
works on sums and multiplication, as you need it to (it would have to
know about bracket rules, which it doesn't at the moment).
If the above description isn't algorithmically clear, feel free to mail
me, and I might (if I find the time) code a reference implementation.
HTH!
Heiko.
More information about the Python-list
mailing list