[Csv] Thoughts about a patch

Magnus Lie Hetland magnus at hetland.org
Mon Mar 15 08:44:33 CET 2004


Andrew McNamara <andrewm at object-craft.com.au>:
>
> >I guess I just haven't understood the code well enough yet, but in the
> >parsing code there are comparisons of the type
> >
> >  if (c == '\n')
> >
> >I suppose the newlines are normalized versions of lineterminator? In
> >other words, no matter what the line terminator is, it is safe to
> >pretend that it has been changed to '\n' in the parsing case
> >statement? Or? (I mean, I've tried to use lineterminator='|' and that
> >worked just nicely, but I don't see the use of lineterminator in the
> >case statement anywhere.)
> 
> One thing to bear in mind is the history of the CSV module - it
> dates back to Python 1.5 times, when python didn't have universal
> newline support.

I see. Even so -- I don't see how universal newline support is needed
for this...?

> If I remember correctly, lineterminator is only used when generating CSV
> output, not when parsing input. On input, the value of lineterminator
> is ignored, and \r and \n are hard-coded.

Oh -- how unfortunate :]

Is this documented in the PEP/standard docs? I've just browsed them,
but couldn't find the distinction between parameters that affect
reading and those affecting writing. To quote the PEP:

  "In addition to the dialect argument, both the reader and writer
   constructors take several specific formatting parameters, specified
   as keyword parameters."

One of the parameters listed under this (which, then, applies to the
reader) is:

  "lineterminator specifies the character sequence which should
   terminate rows."

It seems highly natural to me that reader and writer should be
completely symmetrical here -- i.e. you should *definitely* be able to
read back your own output, using the same Dialect (IMO).

(I do see something hinting at this problem in item 5 of the issue
list, though.)

I guess I had my eyes crossed when I did my experiment with
lineterminator set to '|' -- I thought it worked when reading, but
you're right -- it doesn't.

In other words, a potential patch should probably also add support for
parsing arbitrary line terminators -- or?

It could, of course, be that I should simply write the parsing code
into my own projects in Python. It just seems a shame not to use the
csv module when it exists. It seems to sit on the brink of generality,
just a tad biased toward the Microsoft dialect (which was, I gather,
part of the original design goals).

-- 
Magnus Lie Hetland           "The mind is not a vessel to be filled,
http://hetland.org            but a fire to be lighted."  [Plutarch]


More information about the Csv mailing list