CSV ignores lineterminator

Jeffrey Barish jeffbarish at starband.net
Mon Apr 5 17:01:32 EDT 2004


Skip Montanaro wrote:

>>>>>> "Jeffrey" == Jeffrey Barish <jeffbarish at starband.net> writes:
> 
>     Jeffrey> With
>     Jeffrey> input_data = ['word1\tword2;word3\tword4;',
>     Jeffrey> 'word5\tword6;word7\tword8;']
> 
>     Jeffrey> and
> 
>     Jeffrey> delimiter = '\t'
>     Jeffrey> lineterminator = ';'
> 
>     Jeffrey> shouldn't csv.reader(input_data, dialect='mydialect')
>     return
> 
>     Jeffrey> ['word1', 'word2']
> 
>     Jeffrey> as the first row?  I find that it doesn't matter how I
>     set Jeffrey> lineterminator, csv always terminates at the end of
>     the line returned Jeffrey> by the iterable object passed as its
>     first argument (input_data, in
>     Jeffrey> this case).  I must be missing something basic here.
> 
>     Jeffrey> I may be confused about the interaction between what
>     iterable Jeffrey> object defines as the next row and what
>     csv.reader defines as Jeffrey> the next row.
> 
> Perhaps.  Think of input_data as the conceptual result of
> f.read().split(lineterminator) (though without loss of the line
> terminator):
> 
>     >>> input_data = ['word1\tword2;',
>     >>> 'word3\tword4;','word5\tword6;', 'word7\tword8;'] import csv
>     >>> class d(csv.excel):
>     ...   delimiter='\t'
>     ...   lineterminator=';'
>     ...
>     >>> rdr = csv.reader(input_data, dialect=d)
>     >>> rdr.next()
>     ['word1', 'word2;']
>     >>> rdr.next()
>     ['word3', 'word4;']
>     >>> rdr.next()
>     ['word5', 'word6;']
>     >>> rdr.next()
>     ['word7', 'word8;']
> 
> Skip
> 
Thanks for your reply.

Yes, I expect your example to work because each entry in the list
defines a "line" regardless of csv:

for line in input_data:
        print line

word1   word2;
word3   word4;
word5   word6;
word7   word8;

In fact, if you leave out the semicolons, you get exactly the same list
(sans semicolons, of course), so csv is terminating lines even though
the lineterminator is not there.

In my example, the first "line" of the list is
'word1\tword2;word3\tword4'.  What I expect to happen is that the list
passes the entire line to csv.reader(), which then splits off
'word1\tword2' as the first line by virtue of the presence of the
lineterminator ';' and then continues to scan what remains.  As there
is no line terminator after 'word3\tword4', I expect csv.reader() to
pull in the next "line" of the list in its search for another
lineterminator, which it finds after word6, so the next line from cvs
would be 'word3\tword4word5\tword6'.  I don't see any evidence that cvs
is terminating lines based on lineterminator rather than the
termination used by the iterator object.
-- 
Jeffrey Barish





More information about the Python-list mailing list