csv.reader has trouble with comma inside quotes inside brackets

Terry Reedy tjreedy at udel.edu
Tue Jun 9 17:27:13 EDT 2009


Bret wrote:
> i have a csv file like so:
> row1,field1,[field2][text in field2 "quote, quote"],field3,field
> row2,field1,[field2]text in field2 "quote, quote",field3,field
> 
> using csv.reader to read the file, the first row is broken into two
> fields:
> [field2][text in field2 "quote
> and
>  quote"
> 
> while the second row is read correctly with:
> [field2]text in field2 "quote, quote"
> being one field.
> 
> any ideas how to make csv.reader work correctly for the first case?
> the problem is the comma inside the quote inside the brackets, ie:
> [","]

When posting, give version, minimum code that has problem, and actual 
output.  Cut and past latter two.  Reports are less credible otherwise.

Using 3.1rc1

txt = [
'''row1,field1,[field2][text in field2 "quote, quote"],field3,field''',
'''row2,field1,[field2] text in field2 "quote, quote", field3,field''',
'''row2,field1, field2  text in field2 "quote, quote", field3,field''',
]
import csv
for row in csv.reader(txt): print(len(row),row)

produces

6 ['row1', 'field1', '[field2][text in field2 "quote', ' quote"]', 
field3', 'field']
6 ['row2', 'field1', '[field2] text in field2 "quote', ' quote"', ' 
field3', 'field']
6 ['row2', 'field1', ' field2  text in field2 "quote', ' quote"', ' 
field3', 'field']

In 3.1 at least, the presence or absence of brackets is irrelevant, as I 
expected it to be.  For double quotes to protect the comma delimiter, 
the *entire field* must be quoted, not just part of it.

If you want to escape the delimiter without quoting entire fields, use 
an escape char and change the dialect.  For example

txt = [
'''row1,field1,[field2][text in field2 "quote`, quote"],field3,field''',
'''row2,field1,[field2] text in field2 "quote`, quote", field3,field''',
'''row2,field1, field2  text in field2 "quote`, quote", field3,field''',
]
import csv
for row in csv.reader(txt, quoting=csv.QUOTE_NONE, escapechar = '`'):
     print(len(row),row)

produces what you desire

5 ['row1', 'field1', '[field2][text in field2 "quote, quote"]', 
'field3', 'field']
5 ['row2', 'field1', '[field2] text in field2 "quote, quote"', ' 
field3', 'field']
5 ['row2', 'field1', ' field2  text in field2 "quote, quote"', ' 
field3', 'field']


Terry Jan Reedy




More information about the Python-list mailing list