Dealing with \r in CSV fields in Python2.4

MRAB python at mrabarnett.plus.com
Wed Sep 4 11:31:06 EDT 2013


On 04/09/2013 16:04, Tim Chase wrote:
> I've got some old 2.4 code (requires an external lib that hasn't been
> upgraded) that needs to process a CSV file where some of the values
> contain \r characters.  It appears that in more recent versions (just
> tested in 2.7; docs suggest this was changed in 2.5), Python does the
> Right Thing™ and just creates values in the row containing that \r.
> However, in 2.4, the csv module chokes on it with
>
>    _csv.Error: newline inside string
>
> as demoed by the example code at the bottom of this email.  What's the
> best way to deal with this?  At the moment, I'm just using something
> like
>
>    def unCR(f):
>      for line in f:
>        yield line.replace('\r', '')
>
>    f = file('input.csv', 'rb')
>    for row in csv.reader(unCR(f)):
>      code_to_process(row)
>
> but this throws away data that I'd really prefer to keep if possible.
>
> I know 2.4 isn't exactly popular, and in an ideal world, I'd just
> upgrade to a later 2.x version that does what I need.  Any old-time
> 2.4 pythonistas have sage advice for me?
>
[snip]
You could try replacing the '\r' with another character that doesn't
appear elsewhere and then change it back afterwards.

MARKER = '\x01'

def cr_to_marker(f):
     for line in f:
         yield line.replace('\r', MARKER)

def marker_to_cr(item):
     return item.replace(MARKER, '\r')

f = file('out.txt', 'rb')
r = csv.reader(cr_to_marker(f))
for i, row in enumerate(r): # works in 2.7, fails in 2.4
     row = [marker_to_cr(item) for item in row]
     print repr(row)
f.close()

Which OS are you using? On Windows the lines (rows) end with '\r\n', so
the last item of each row will end with '\r', which you'll need to
strip off. (That would be a problem only if the last item of a row
could end with '\r'.)




More information about the Python-list mailing list