splitting delimited strings

Wed Jun 15 21:18:27 EDT 2005

Mark -

Let me weigh in with a pyparsing entry to your puzzle.  It wont be
blazingly fast, but at least it will give you another data point in
your comparison of approaches.  Note that the parser can do the
string-to-int conversion for you during the parsing pass.

If @rv@ and @pv@ are record type markers, then you can use pyparsing to
create more of a parser than just a simple tokenizer, and parse out the
individual record fields into result attributes.

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul

test1 = "@hello@@world@@foo at bar"
test2 = """@rv@ 2 @db.locks@ @//depot/hello.txt@ @mh@ @mh@ 1 1 44
@pv@ 0 @db.changex@ 44 44 @mh@ @mh@ 1118875308 0 @ :@@: :@@@@: @"""

from pyparsing import *

AT = Literal("@")
atQuotedString = AT.suppress() + Combine(OneOrMore((~AT + SkipTo(AT)) |

                               (AT +
AT).setParseAction(replaceWith("@")) )) + AT.suppress()

# extract any @-quoted strings
for test in (test1,test2):
    for toks,s,e in atQuotedString.scanString(test):
        print toks
    print

# parse all tokens (assume either a positive integer or @-quoted
string)
def makeInt(s,l,toks):
    return int(toks[0])
entry = OneOrMore( Word(nums).setParseAction(makeInt) | atQuotedString
)

for t in test2.split("\n"):
    print entry.parseString(t)

Prints out:

['hello at world@foo']

['rv']
['db.locks']
['//depot/hello.txt']
['mh']
['mh']
['pv']
['db.changex']
['mh']
['mh']
[':@: :@@: ']

['rv', 2, 'db.locks', '//depot/hello.txt', 'mh', 'mh', 1, 1, 44]
['pv', 0, 'db.changex', 44, 44, 'mh', 'mh', 1118875308, 0, ':@: :@@: ']