Splitting with Regular Expressions

Paul McGuire ptmcg at austin.rr.com
Thu Mar 17 10:34:45 EST 2005


A pyparsing example may be less mysterious.  You can define words to be
any group of alphas, or you can define a word to be alphas concatenated
by '.'s.  scanString is a generator that scans for matches in the input
string and returns the matching token list, and the start and end
location of the match within the input string.  (Because it returns a
list, this is why we have to peel of element 0 of each match to get the
word.)  See the sample code attached.

-- Paul
(get pyparsing at http://pyparsing.sourceforge.net)


from pyparsing import Word,alphas,delimitedList

test= 'This+(that)= a.string!!!  This... is .just.a sentence.'

word = Word(alphas)

print [ wd[0] for wd,s,e in word.scanString(test) ]

# prints ['This', 'that', 'a', 'string', 'This', 'is', 'just', 'a',
'sentence']

word = delimitedList(Word(alphas), delim=".",combine=True)
print [ wd[0] for wd,s,e in word.scanString(test) ]

# prints ['This', 'that', 'a.string', 'This', 'is', 'just.a',
'sentence']




More information about the Python-list mailing list