Escaping commas within parens in CSV parsing?

Skip Montanaro skip at pobox.com
Thu Jun 30 22:59:10 EDT 2005


    Ramon> I am trying to use the csv module to parse a column of values
    Ramon> containing comma-delimited values with unusual escaping:

    Ramon> AAA, BBB, CCC (some text, right here), DDD

    Ramon> I want this to come back as:

    Ramon> ["AAA", "BBB", "CCC (some text, right here)", "DDD"]

Alas, there's no "escaping" at all in the line above.  I see no obvious way
to distinguish one comma from another in this example.  If you mean the fact
that the comma you want to retain is in parens, that's not escaping.  Escape
characters don't appear in the output as they do in your example.

    Ramon> I can probably hack this with regular expressions but I thought
    Ramon> I'd check to see if anyone had any quick suggestions for how to
    Ramon> do this elegantly first.

I see nothing obvious unless you truly mean that the beginning of each field
is all caps.  In that case you could wrap a file object and :

    import re
    class FunnyWrapper:
        """untested"""
        def __init__(self, f):
            self.f = f

        def __iter__(self):
            return self

        def next(self):
            return '"' + re.sub(r',( *[A-Z]+)', r'","\1', self.f.next()) + '"'

and use it like so:

    reader = csv.reader(FunnyWrapper(open("somefile.csv", "rb")))
    for row in reader:
        print row

(I'm not sure what the ramifications are of iterating over a file opened in
binary mode.)

Skip



More information about the Python-list mailing list