parsing CSV files with quotes

Warren Postma embed at geocities.com
Thu Mar 30 11:50:27 EST 2000


Suppose I have a CSV file where line 1 is the column names, and lines 2..n
are comma separated variables, where all String fields are quoted like this:

ID, NAME, AGE
1, "Postma, Warren", 30
2, "Twain, Shania",  31
3, "Nelson, Willy",  57
4, "Austin, \"Stone Cold\" Steve", 34

So, the obvious thing I tried is:

import string
>>> print string.splitfields("4, \"Austin, \\\"Stone Cold\\\" Steve,
34",",")
['4', ' "Austin', ' \\"Stone Cold\\" Steve', ' 34']

Hmm. Interesting. So I tried this:

>>> print string.splitfields(r'4, "Austin, \"Stone Cold\" Steve", 34')
['4,', '"Austin,', '\\"Stone', 'Cold\\"', 'Steve",', '34']

I'm getting close, I can feel it!

The Rules:

1. All integer and other fields are output as ascii.
2. String fields have quotes. Commas are allowed inside the quotes.
3. Quotes inside quotes are escaped by a backslash
4. Backslashes are themselves quoted by a backslash

Is this complex enough that I basically need the "parser" module of Python?

Problem is I'm scared of it. Anyone got any Parser Tutorials Howtos/Links?

Or is this beasty solveable by judicious use of Regular Expressions?

While I'm taking up bandwidth, I'll ask another silly question:

Is there a "compressed dbShelve" out there anywhere? In this case I just
want to store arrays and dictionaries of built-in Python types, in a
compressed manner, in a bsd database. Anyone heard of something like this?

Warren





More information about the Python-list mailing list