Suggestions for workaround in CSV bug

Simmons, Stephen simmons1 at anz.com
Mon Jan 23 17:33:39 EST 2006


Hi,

I've come across a bug in CSV where the csv.reader() raises an 
exception if the input line contains '\r'. Example code and output
below shows a test case where csv.reader() cannot read an array
written by csv.writer(). 

I believe this is a known bug and may have been fixed for Python 2.5.
However I'm after suggestions for workarounds for Python 2.4.2. 

This is part of a project where I'm storing large tables from 
mainframe systems as CSVs for subsequent data cleansing and 
post-processing. Some tables have 300 columns and tens of millions 
of rows. The mainframe data fields are poorly documented, so I
don't know at the time of writing the CSV whether a '\r'
is part of a binary field and so must be retained, 
or is a random byte in an uninitialised field and so 
can safely be deleted. Therefore I'd prefer
to make minimum changes that might screw up the data.

Any suggestions for how to proceed are most welcome!

Thanks in advance,

Stephen Simmons


#======================================================
# Bug in Python 2.4.2's csv module
# Stephen Simmons, mail at stevesimmons.com, 24 Jan 2006

import csv

s = [ ['a'], ['\r'], ['b'] ]
name = 'c://temp//test2.csv'

print 'Writing CSV file containing %s' % repr(s)
f = file(name, 'wb')
csv.writer(f).writerows(s)
f.close()

print 'CSV file is %s' % repr(file(name, 'rb').read())

print 'Now reading back as CSV...'
for r in csv.reader(file(name, 'rb')):
    print 'Read row containing %s' % repr(r)


# Output is
"""In [29]: run csv_error.py
Writing CSV file containing [['a'], ['\r'], ['b']]
Contents of the CSV file are 'a\r\n"\r"\r\nb\r\n'
Now reading back as CSV...
Read row containing ['a']
---------------------------------------------------------------------------
_csv.Error                                   Traceback (most recent call last)


c:\temp\csv_error.py
     14 print 'CSV file is %s' % repr(file(name, 'rb').read())
     15
     16 print 'Now reading back as CSV...'
---> 17 for r in csv.reader(file(name, 'rb')):
     18     print 'Read row containing %s' % repr(r)

Error: newline inside string
WARNING: Failure executing file: <csv_error.py>

"""




More information about the Python-list mailing list