This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: csv module cannot handle embedded \r
Type: behavior Stage: resolved
Components: Extension Modules, Library (Lib) Versions: Python 2.6
process
Status: closed Resolution: out of date
Dependencies: 1465014 Superseder:
Assigned To: andrewmcnamara Nosy List: ajaksu2, andrewmcnamara, gnbond, goodger, r.david.murray, rhettinger
Priority: normal Keywords:

Created on 2004-06-07 04:46 by gnbond, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
tcsv.py gnbond, 2004-06-07 04:47
Messages (8)
msg21057 - (view) Author: Gregory Bond (gnbond) Date: 2004-06-07 04:46
CSV module cannot handle the case of embedded \r (i.e.
carriage return) in a field.

As far as I can see, this is hard-coded into the _csv.c
file and cannot be fixed with Dialect changes.
msg21058 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2004-06-07 05:02
Logged In: YES 
user_id=80475

Skip, does this coincide with your planned switchover to
universal newlines?
msg21059 - (view) Author: Andrew McNamara (andrewmcnamara) * (Python committer) Date: 2004-06-07 05:32
Logged In: YES 
user_id=698599

I suspect this restriction (CR appearing within a quoted 
field) is a historical accident and can be safely removed. 
msg21060 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2004-06-07 11:25
Logged In: YES 
user_id=44345

It certainly intersects with it somehow.  ;-)  If nothing else, it
will serve as a useful test case.
msg21061 - (view) Author: Andrew McNamara (andrewmcnamara) * (Python committer) Date: 2005-01-13 11:34
Logged In: YES 
user_id=698599

If you're interested, I've just checked in a change to the CVS head for 
Python 2.5 that may, at least partially, fix this problem (if you try it, let me 
know how it goes).
msg21062 - (view) Author: David Goodger (goodger) (Python committer) Date: 2006-04-05 15:35
Logged In: YES 
user_id=7733

I just filed a bug (http://www.python.org/sf/1465014) that
seems to be related to this. Revision 38290 on
Modules/_csv.c includes the addition of this code:

    else if (c == '\n' || c == '\r') {
  	self->state = EAT_CRNL;
  	break;
    }

(and similar). This seems to be eating (deleting) control
chars, but newlines used to be significant. 

Embedded line breaks are allowed, according to RFC 4180
(http://www.ietf.org/rfc/rfc4180.txt). And according to the
Wikipedia entry
(http://en.wikipedia.org/wiki/Comma-separated_values), "a
line break within an element must be preserved."
msg82052 - (view) Author: Daniel Diniz (ajaksu2) * (Python triager) Date: 2009-02-14 13:56
IIUC, I get the correct behavior:

trunk-py$ ./python ~/Desktop/tcsv.py
['fld1', 'fld2', 'fld3 ', 'fld4']
['fld1', 'fld2', 'fld3 \r', 'fld4']

trunk-py$ cat ~/Desktop/tcsv.py
#! /usr/local/bin/python

import csv

d = 'fld1,fld2,"fld3 ",fld4\r\n'
d2 = 'fld1,fld2,"fld3 \r'
d3 = '",fld4\r\n'

r = csv.reader([d, d2, d3], dialect="excel")
for f in r:
        print f
msg106189 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-05-20 20:55
At some point I added test_roundtrip_quoteed_newlines to the csv unit tests, and it passes both on trunk and py3k.  I believe if there was a bug here it has been fixed.  I just backported the test to 2.6 in r81382, and it passes there as well.  Closing as out of date.

Heh, I just noticed that the method name is misspelled :(
History
Date User Action Args
2022-04-11 14:56:04adminsetgithub: 40357
2010-05-20 20:55:26r.david.murraysetstatus: open -> closed

nosy: + r.david.murray
messages: + msg106189

resolution: out of date
stage: test needed -> resolved
2010-05-20 20:27:03skip.montanarosetnosy: - skip.montanaro
2009-02-14 13:56:28ajaksu2setversions: + Python 2.6
nosy: + ajaksu2
messages: + msg82052
dependencies: + CSV regression in 2.5a1: multi-line cells
components: + Extension Modules
type: behavior
stage: test needed
2004-06-07 04:46:56gnbondcreate