Escaping commas within parens in CSV parsing?

Paul McGuire ptmcg at austin.rr.com
Thu Jun 30 23:30:54 EDT 2005


Well, this doesn't have the terseness of an re solution, but it
shouldn't be hard to follow.
-- Paul

#~ This is a very crude first pass.  It does not handle nested
#~ ()'s, nor ()'s inside quotes.  But if your data does not
#~ stray too far from the example, this will probably do the job.

#~ Download pyparsing at http://pyparsing.sourceforge.net.
import pyparsing as pp

test = "AAA, BBB , CCC (some text, right here), DDD"

COMMA = pp.Literal(",")
LPAREN = pp.Literal("(")
RPAREN = pp.Literal(")")
parenthesizedText = LPAREN + pp.SkipTo(RPAREN) + RPAREN

nonCommaChars = "".join( [ chr(c) for c in range(32,127)
                            if c not in map(ord,list(",()")) ] )
nonCommaText = pp.Word(nonCommaChars)

commaListEntry = pp.Combine(pp.OneOrMore( parenthesizedText |
nonCommaText ),adjacent=False)
commaListEntry.setParseAction( lambda s,l,t: t[0].strip() )

csvList = pp.delimitedList( commaListEntry )
print csvList.parseString(test)




More information about the Python-list mailing list