[ python-Bugs-1072404 ] Bugs in _csv module - lineterminator

SourceForge.net noreply at sourceforge.net
Tue Jan 18 12:25:47 CET 2005


Bugs item #1072404, was opened at 2004-11-24 12:00
Message generated for change (Comment added) made by fresh
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1072404&group_id=5470

Category: Python Library
Group: None
>Status: Open
>Resolution: None
>Priority: 1
Submitted By: Chris Withers (fresh)
Assigned to: Andrew McNamara (andrewmcnamara)
Summary: Bugs in _csv module - lineterminator

Initial Comment:
On trying to parse a '\r' terminated csv generated on a
Mac, I get a "newline inside string" error from the csv
module.

Two things sprung to mind having read:
http://cvs.sourceforge.net/viewcvs.py/python/python/dist/src/Modules/_csv.c?rev=1.15&view=markup
...for a bit.

1. The Dialect's lineterminator doesn't appear to be
used when parsing a CSV. This feels like a bug to be,
'cos I could specify the terminator if
Reader_iternext(ReaderObj *self) used it :-S

2. The processing in Reader_iternext(ReaderObj *self)
assumes that a '\r' will be followed by '\0' for Macs,
'\n' for windows, and anything else is an error.

but:

>>> c = open('var\data\metadata.csv').read()
>>> c[:100]
'BENEFIT,,Subjects relating to all benefits,AB
\rBENEFIT,PARTNERDIED,Bereavement

Should I be expecting to see a '\0' there?

Anyway, the real bug seems to be the reader's ignorance
of the lineterminator. However, even if my analysis is
off the mark, the problem still exists :-S

cheers,

Chris

----------------------------------------------------------------------

>Comment By: Chris Withers (fresh)
Date: 2005-01-18 11:25

Message:
Logged In: YES 
user_id=24723

I don't think its fair to close this as a rejection.
The documentation implies that you can control what line
terminator this module uses, which currently isn't the case.

I'm not saying this is a high priority issue, just that it
shouldn't be rejected in case some day someone (maybe even
me ;-) wants to haev a goat fixing it...

----------------------------------------------------------------------

Comment By: Andrew McNamara (andrewmcnamara)
Date: 2005-01-13 04:14

Message:
Logged In: YES 
user_id=698599

The reader expects to be supplied an iterator that returns lines - in this 
case, the file iterator has not recognised \r as end-of-line and has read the 
whole file in and yielded that as a "line". If you use universal-newline mode 
on your source file, you should have more luck.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2004-11-25 04:23

Message:
Logged In: YES 
user_id=44345

This is a known problem.  See the April archives of the csv
mailing list:

http://manatee.mojam.com/pipermail/csv/2004-April/thread.html

Solutions are welcome.  I suspect any solution will involve
either
discarding PyIter_Next altogether or further subdividing what it
returns.

A couple things to note in the way of workarounds:

1. Reader_iternext() defers to PyIter_Next() to grab the
next line,
so there's really no opportunity to interject the
lineterminator into
the operation with the current code.  This means reading from
StringIO objects that use \r lineterminators will always fail.

2. If you have a real file as input and open it in universal
newline
mode you will get the correct behavior.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1072404&group_id=5470


More information about the Python-bugs-list mailing list