[Csv] trial zip/tar packages of csv module available

Thu Feb 13 23:16:51 CET 2003

On Thu, 13 Feb 2003 13:00:36 -0600, Skip Montanaro <skip at pobox.com> wrote:

>
> If you are interested in reading or writing CSV files from Python and you
> have Python 2.2 or 2.3 available, please take a moment to download, 
> extract
> and install either or both of the following URLs:
>
> http://manatee.mojam.com/~skip/csv.tar.gz
> http://manatee.mojam.com/~skip/csv.zip

> The goal is to get this package into Python 2.3, though we've tried to 
> keep
> it working under 2.2.  It uses iterators, so I don't know if it will work
> with anything before 2.2.  The package has been built on Linux and Mac OS 
> X
> at this point.  I think it's been built on Windows though I'm not 
> positive.
> There shouldn't be anything terribly platform-dependent there.
>

Good news first, whinges at the end of the message :-)

===
Compiles & installs OK out-of-the-box with Python 2.2, Windows 2000, BCC32 
(Borland 5.5 freebie command-line compiler) -- thanks to revision 1.30 :-)
===
C:\csv\test>python test_csv.py
*** skipping leakage tests ***
........................................................
----------------------------------------------------------------------
Ran 56 tests in 0.030s

OK
===
Slurped through a 150Mb CSV file at a reasonable speed without any memory 
leak that could be detected by the primitive method of watching the Task 
Manager memory graph.
===

Doco:

"""0.1.1 Module Contents
The csv module defines the following functions.
reader(iterable[, dialect=”excel” ] [, fmtparam])
Return a reader object which will iterate over lines in the given 
csvfile."""

Huh? What "given csvfile"?
Need to define carefully what iterable.next() is expected to deliver; a 
line, with or without a trailing newline? a string of 1 or more bytes which 
may contain embedded line separators, either as true separators or as 
(quoted) data? [e.g. iterable could be a generator which uses say 
read(16384)]. I have noticed in the csv mailing list some muttering along 
the lines of "the iterable's underlying file must have been opened in 
binary mode"!? Que?

This might necessitate a FAQ entry:
>>> cr = csv.reader("iterable is string!")
>>> [x for x in cr]
[['i'], ['t'], ['e'], ['r'], ['a'], ['b'], ['l'], ['e'], [' '], ['i'], 
['s'], [' '], ['s'], ['t'], ['r'], ['i'], ['n'], ['g'], ['!']
]
>>>

===

Does the reader detect any errors at all? E.g. I expected some complaint 
here, instead of silently doing nothing:
>>> import csv
>>> cr = csv.reader(['f1,"unterminated quoted field,f3'])
>>> for x in cr: print x
...
>>> cr = csv.reader(['f1,"terminated quoted field",f3'])
>>> for x in cr: print x
...
['f1', 'terminated quoted field', 'f3']
>>> cr = csv.reader(['f1,"unterminated quoted field,f3\n'])
>>> for x in cr: print x
...
>>>
===

Judging by the fact that in _csv.c '\0' is passed around as a line-ending 
signal, it's not 8-bit-clean. This fact should be at least documented, if 
not fixed (which looks like a bit of a rewrite). Strange behaviour on 
embedded '\0' may worry not only pedants but also folk who are recipients 
of data files created by J. Random Boofhead III and friends.

===
Cheers,
John