[Csv] bugs in parsing csv?

sjmachin at lexicon.net sjmachin at lexicon.net
Sat Jan 22 00:06:57 CET 2005


I came across this example in the online version of "Programming in Lua" by Roberto 
Ieru.+y:

>>> weird = '"hello "" hello", "",""\r\n'

This is not IMHO a correctly formed CSV string. It would not be produced by csv.writer.

However csv.reader accepts it without complaint:
>>> import csv
>>> rdr = csv.reader([weird])
>>> weird2 = rdr.next()
>>> weird2
['hello " hello', ' ""', '']

>>> wtr = csv.writer(file('weird2.csv', 'wb'))
>>> wtr.writerow(weird2)
>>> del wtr
>>> file('weird2.csv', 'rb').read()
'"hello "" hello"," """"",\r\n'
# correctly quoted.

Here are some more examples:

>>> csv.reader([' "\r\n']).next()
[' "']
>>> csv.reader([' ""\r\n']).next()
[' ""']
>>> csv.reader(['x ""\r\n']).next()
['x ""']
>>> csv.reader(['x "\r\n']).next()
['x "']

Looks like we don't give a damn if the field doesn't start with a quote. In the real world 
this result might be OK for a field like 'Pat O"Brien' but it does indicate that the data 
source is probably _NOT_ quoting at all.

However a not-infrequent mistake made by people generating what they call csv files is 
to wrap quotes around some/all fields without doubling any pre-existing quotes:

>>> csv.reader(['"Pat O"Brien"\r\n']).next()
['Pat OBrien"'] <<<<<<<<<<<============== aarrbejaysus!!!

Further examples of where the data source needs head alignment and csv.reader 
doesn't complain, giving an unfortunate result:

>>> csv.reader(['spot",the",mistake"\r\n']).next()
['spot"', 'the"', 'mistake"']

>>> csv.reader(['"attempt", "at", "pretty", "formatting"\r\n']).next()
['attempt', ' "at"', ' "pretty"', ' "formatting"']


More information about the Csv mailing list