[Python-Dev] [Python-checkins] cpython: #15927: Fix cvs.reader parsing of escaped \r\n with quoting off.

Kristján Valur Jónsson kristjan at ccpgames.com
Wed Mar 20 04:16:53 CET 2013


The compiler complains about this line:
if (c == '\n' | c=='\r') {

Perhaps you wanted a Boolean operator?

-----Original Message-----
From: Python-checkins [mailto:python-checkins-bounces+kristjan=ccpgames.com at python.org] On Behalf Of r.david.murray
Sent: 19. mars 2013 19:42
To: python-checkins at python.org
Subject: [Python-checkins] cpython: #15927: Fix cvs.reader parsing of escaped \r\n with quoting off.

http://hg.python.org/cpython/rev/940748853712
changeset:   82815:940748853712
parent:      82811:684b75600fa9
user:        R David Murray <rdmurray at bitdance.com>
date:        Tue Mar 19 22:41:47 2013 -0400
summary:
  #15927: Fix cvs.reader parsing of escaped \r\n with quoting off.

This fix means that such values are correctly roundtripped, since cvs.writer already does the correct escaping.

Patch by Michael Johnson.

files:
  Lib/test/test_csv.py |   9 +++++++++
  Misc/ACKS            |   1 +
  Misc/NEWS            |   3 +++
  Modules/_csv.c       |  13 ++++++++++++-
  4 files changed, 25 insertions(+), 1 deletions(-)


diff --git a/Lib/test/test_csv.py b/Lib/test/test_csv.py
--- a/Lib/test/test_csv.py
+++ b/Lib/test/test_csv.py
@@ -308,6 +308,15 @@
             for i, row in enumerate(csv.reader(fileobj)):
                 self.assertEqual(row, rows[i])
 
+    def test_roundtrip_escaped_unquoted_newlines(self):
+        with TemporaryFile("w+", newline='') as fileobj:
+            writer = csv.writer(fileobj,quoting=csv.QUOTE_NONE,escapechar="\\")
+            rows = [['a\nb','b'],['c','x\r\nd']]
+            writer.writerows(rows)
+            fileobj.seek(0)
+            for i, row in enumerate(csv.reader(fileobj,quoting=csv.QUOTE_NONE,escapechar="\\")):
+                self.assertEqual(row,rows[i])
+
 class TestDialectRegistry(unittest.TestCase):
     def test_registry_badargs(self):
         self.assertRaises(TypeError, csv.list_dialects, None) diff --git a/Misc/ACKS b/Misc/ACKS
--- a/Misc/ACKS
+++ b/Misc/ACKS
@@ -591,6 +591,7 @@
 Fredrik Johansson
 Gregory K. Johnson
 Kent Johnson
+Michael Johnson
 Simon Johnston
 Matt Joiner
 Thomas Jollans
diff --git a/Misc/NEWS b/Misc/NEWS
--- a/Misc/NEWS
+++ b/Misc/NEWS
@@ -289,6 +289,9 @@
 Library
 -------
 
+- Issue #15927: CVS now correctly parses escaped newlines and carriage
+  when parsing with quoting turned off.
+
 - Issue #17467: add readline and readlines support to mock_open in
   unittest.mock.
 
diff --git a/Modules/_csv.c b/Modules/_csv.c
--- a/Modules/_csv.c
+++ b/Modules/_csv.c
@@ -51,7 +51,7 @@
 typedef enum {
     START_RECORD, START_FIELD, ESCAPED_CHAR, IN_FIELD,
     IN_QUOTED_FIELD, ESCAPE_IN_QUOTED_FIELD, QUOTE_IN_QUOTED_FIELD,
-    EAT_CRNL
+    EAT_CRNL,AFTER_ESCAPED_CRNL
 } ParserState;
 
 typedef enum {
@@ -644,6 +644,12 @@
         break;
 
     case ESCAPED_CHAR:
+        if (c == '\n' | c=='\r') {
+            if (parse_add_char(self, c) < 0)
+                return -1;
+            self->state = AFTER_ESCAPED_CRNL;
+            break;
+        }
         if (c == '\0')
             c = '\n';
         if (parse_add_char(self, c) < 0) @@ -651,6 +657,11 @@
         self->state = IN_FIELD;
         break;
 
+    case AFTER_ESCAPED_CRNL:
+        if (c == '\0')
+            break;
+        /*fallthru*/
+
     case IN_FIELD:
         /* in unquoted field */
         if (c == '\n' || c == '\r' || c == '\0') {

--
Repository URL: http://hg.python.org/cpython


More information about the Python-Dev mailing list