parsing an Excel formula with the re module

Steve Holden steve at holdenweb.com
Tue Jan 5 21:06:01 EST 2010


Tim Chase wrote:
> vsoler wrote:
>> Hence, I need to parse Excel formulas. Can I do it by means only of re
>> (regular expressions)?
>>
>> I know that for simple formulas such as "=3*A7+5" it is indeed
>> possible. What about complex for formulas that include functions,
>> sheet names and possibly other *.xls files?
> 
> Where things start getting ugly is when you have nested function calls,
> such as
> 
>   =if(Sum(A1:A25)>42,Min(B1:B25), if(Sum(C1:C25)>3.14,
> (Min(C1:C25)+3)*18,Max(B1:B25)))
> 
> Regular expressions don't do well with nested parens (especially
> arbitrarily-nesting-depth such as are possible), so I'd suggest going
> for a full-blown parsing solution like pyparsing.
> 
> If you have fair control over what can be contained in the formulas and
> you know they won't contain nested parens/functions, you might be able
> to formulate some sort of "kinda, sorta, maybe parses some forms of
> formulas" regexp.
> 
And don't forget about named ranges, which can reference cells without
using anything but a plain identifier ...

regards
 Steve
-- 
Steve Holden           +1 571 484 6266   +1 800 494 3119
PyCon is coming! Atlanta, Feb 2010  http://us.pycon.org/
Holden Web LLC                 http://www.holdenweb.com/
UPCOMING EVENTS:        http://holdenweb.eventbrite.com/




More information about the Python-list mailing list