[Csv] trial zip/tar packages of csv module available

Fri Feb 14 11:30:31 CET 2003

On Fri, 14 Feb 2003 17:11:30 +1100, Andrew McNamara <andrewm at object- 
craft.com.au> wrote:

>> Slurped through a 150Mb CSV file at a reasonable speed without any 
>> memory leak that could be detected by the primitive method of watching 
>> the Task Manager memory graph.
>
> I've been using a --enable-pydebug version of python while working on the
> _csv module, and have been watching the reference counts fairly 
> carefully.

Yes, I'd gathered that from various asides in messages on this list. I was 
just being a little ironical about my own primitive way of checking.

>
>> """0.1.1 Module Contents
>> The csv module defines the following functions.
>> reader(iterable[, dialect="excel" ] [, fmtparam])
>> Return a reader object which will iterate over lines in the given 
>> csvfile."""
>>
>> Huh? What "given csvfile"?
>> Need to define carefully what iterable.next() is expected to deliver; a 
>> line, with or without a trailing newline?
>
> In the docstring, I changed this to:
>
> The "iterable" argument can be any object that returns a line
> of input for each iteration, such as a file object or a list.  The
> optional "dialect" parameter is discussed below.  The function
> also accepts optional keyword arguments which override settings
> provided by the dialect.
> The returned object is an iterator.  Each iteration returns a row
> of the CSV file (which can span multiple input lines):

There is not necessarily a file involved --- say "returns a row of CSV 
data"

>
> Do you think this is clearer?

Frankly, no. You've dropped the "given csvfile" (almost), but you haven't 
said whether a "line" is expected to be terminated, and if so with what: 
(a) \n irrespective of platform (b) platform's native terminator (c) \r or 
\r\n or \n (don't care which).

My guess is that if the "line" is terminated by \r or \r\n or \n, you'll 
ignore the terminator, and if it's not terminated at all, then there's 
nothing to ignore, and happiness prevails. Am I correct?

>
> The reader will cope with a file opened binary or not - it *should*
> do the right thing in either case.

The reader doesn't know what the iterable is iterating over. The behaviour 
should be defined in terms of what the reader expects iterable.next() to 
deliver.

>
>> This might necessitate a FAQ entry:
>>>>> cr = csv.reader("iterable is string!")
>>>>> [x for x in cr]
>> [['i'], ['t'], ['e'], ['r'], ['a'], ['b'], ['l'], ['e'], [' '], ['i'], 
>> ['s'], [' '], ['s'], ['t'], ['r'], ['i'], ['n'], ['g'], ['!']
>> ]
>
> I don't think there is ever a case where you would want the input
> iteratable to be a string - I could probably just raise an exception if
> it is?

You certainly wouldn't want the behaviour demonstrated above. However the 
punter may get confused and go cr = csv.reader(file("raboof.csv".read()))

>
>> Judging by the fact that in _csv.c '\0' is passed around as a line- 
>> ending signal, it's not 8-bit-clean. This fact should be at least 
>> documented, if not fixed (which looks like a bit of a rewrite). Strange 
>> behaviour on embedded '\0' may worry not only pedants but also folk who 
>> are recipients of data files created by J. Random Boofhead III and 
>> friends.
>
> Yep - Skip - can you doco the fact that the input should not contain null
> characters or be unicode strings?
>
> Null characters in the input will be treated as newlines, if I remember
> correctly.

Docoing that would be useful as well.

Cheers,
John

--