multi split function taking delimiter list
Paul McGuire
ptmcg at austin.rr.com
Thu Nov 16 16:33:16 EST 2006
On Nov 14, 5:41 pm, "Sam Pointon" <free.condime... at gmail.com> wrote:
> On Nov 14, 7:56 pm, "martins... at gmail.com" <martins... at gmail.com>
> wrote:
>
> > Hi, I'm looking for something like:
>
> > multi_split( 'a:=b+c' , [':=','+'] )
>
> > returning:
> > ['a', ':=', 'b', '+', 'c']
>
> > whats the python way to achieve this, preferably without regexp?
>
> pyparsing <http://pyparsing.wikispaces.com/> is quite a cool package
> for doing this sort of thing.
Thanks for mentioning pyparsing, Sam!
This is a good example of using pyparsing for just basic tokenizing,
and it will do a nice job of splitting up the tokens, whether there is
whitespace or not.
For instance, if you were tokenizing using the string split() method,
you would get nice results from "a := b + c", but not so good from "a:=
b+ c". Using Sam Pointon's simple pyparsing expression, you can split
up the arithmetic using the symbol expressions, and the whitespace is
pretty much ignored.
But pyparsing can be used for more than just tokenizing. Here is a
slightly longer pyparsing example, using a new pyparsing helper method
called operatorPrecedence, which can shortcut the definition of
operator-separated expressions with () grouping. Note how this not
only tokenizes the expression, but also identifies the implicit groups
based on operator precedence. Finally, pyparsing allows you to label
the parsed results - in this case, you can reference the LHS and RHS
sides of your assignment statement using the attribute names "lhs" and
"rhs". This can really be handy for complicated grammars.
-- Paul
from pyparsing import *
number = Word(nums)
variable = Word(alphas)
operand = number | variable
arithexpr = operatorPrecedence( operand,
[("!", 1, opAssoc.LEFT), # factorial
("^", 2, opAssoc.RIGHT), # exponentiation
(oneOf('+ -'), 1, opAssoc.RIGHT), # leading sign
(oneOf('* /'), 2, opAssoc.LEFT), # multiplication
(oneOf('+ -'), 2, opAssoc.LEFT),] # addition
)
assignment = (variable.setResultsName("lhs") +
":=" +
arithexpr.setResultsName("rhs"))
test = ["a:= b+c",
"a := b + -c",
"y := M*X + B",
"e := m * c^2",]
for t in test:
tokens = assignment.parseString(t)
print tokens.asList()
print tokens.lhs, "<-", tokens.rhs
print
Prints:
['a', ':=', ['b', '+', 'c']]
a <- ['b', '+', 'c']
['a', ':=', ['b', '+', ['-', 'c']]]
a <- ['b', '+', ['-', 'c']]
['y', ':=', [['M', '*', 'X'], '+', 'B']]
y <- [['M', '*', 'X'], '+', 'B']
['e', ':=', ['m', '*', ['c', '^', 2]]]
e <- ['m', '*', ['c', '^', 2]]
More information about the Python-list
mailing list