multi split function taking delimiter list

Paul McGuire ptmcg at austin.rr.com
Thu Nov 16 16:33:16 EST 2006


On Nov 14, 5:41 pm, "Sam Pointon" <free.condime... at gmail.com> wrote:
> On Nov 14, 7:56 pm, "martins... at gmail.com" <martins... at gmail.com>
> wrote:
>
> > Hi, I'm looking for something like:
>
> > multi_split( 'a:=b+c' , [':=','+'] )
>
> > returning:
> > ['a', ':=', 'b', '+', 'c']
>
> > whats the python way to achieve this, preferably without regexp?
>
> pyparsing <http://pyparsing.wikispaces.com/> is quite a cool package
> for doing this sort of thing.

Thanks for mentioning pyparsing, Sam!

This is a good example of using pyparsing for just basic tokenizing,
and it will do a nice job of splitting up the tokens, whether there is
whitespace or not.

For instance, if you were tokenizing using the string split() method,
you would get nice results from "a := b + c", but not so good from "a:=
b+ c".  Using Sam Pointon's simple pyparsing expression, you can split
up the arithmetic using the symbol expressions, and the whitespace is
pretty much ignored.

But pyparsing can be used for more than just tokenizing.  Here is a
slightly longer pyparsing example, using a new pyparsing helper method
called operatorPrecedence, which can shortcut the definition of
operator-separated expressions with () grouping.  Note how this not
only tokenizes the expression, but also identifies the implicit groups
based on operator precedence.  Finally, pyparsing allows you to label
the parsed results - in this case, you can reference the LHS and RHS
sides of your assignment statement using the attribute names "lhs" and
"rhs".  This can really be handy for complicated grammars.

-- Paul


from pyparsing import *

number = Word(nums)
variable = Word(alphas)
operand = number | variable

arithexpr = operatorPrecedence( operand,
    [("!", 1, opAssoc.LEFT),      # factorial
     ("^", 2, opAssoc.RIGHT),     # exponentiation
     (oneOf('+ -'), 1, opAssoc.RIGHT),  # leading sign
     (oneOf('* /'), 2, opAssoc.LEFT),   # multiplication
     (oneOf('+ -'), 2, opAssoc.LEFT),]  # addition
    )

assignment = (variable.setResultsName("lhs") +
                ":=" +
                arithexpr.setResultsName("rhs"))

test = ["a:= b+c",
        "a := b + -c",
        "y := M*X + B",
        "e := m * c^2",]

for t in test:
    tokens = assignment.parseString(t)
    print tokens.asList()
    print tokens.lhs, "<-", tokens.rhs
    print

Prints:
['a', ':=', ['b', '+', 'c']]
a <- ['b', '+', 'c']

['a', ':=', ['b', '+', ['-', 'c']]]
a <- ['b', '+', ['-', 'c']]

['y', ':=', [['M', '*', 'X'], '+', 'B']]
y <- [['M', '*', 'X'], '+', 'B']

['e', ':=', ['m', '*', ['c', '^', 2]]]
e <- ['m', '*', ['c', '^', 2]]




More information about the Python-list mailing list