mishandling of embedded NULs (was: Re: [Csv] trial zip/tar packages of csv module available)

John Machin sjmachin at lexicon.net
Fri Feb 14 23:48:33 CET 2003


[John Machin]
>>> Judging by the fact that in _csv.c '\0' is passed around as a line- 
>>> ending signal, it's not 8-bit-clean. This fact should be at least 
>>> documented, if not fixed (which looks like a bit of a rewrite). Strange 
>>> behaviour on embedded '\0' may worry not only pedants but also folk who 
>>> are recipients of data files created by J. Random Boofhead III and 
>>> friends.

[Andrew McNamara]
>> Yep - Skip - can you doco the fact that the input should not contain 
>> null
>> characters or be unicode strings?
>>
>> Null characters in the input will be treated as newlines, if I remember
>> correctly.
>

[John Machin]
> Docoing that would be useful as well.

[and it's me again:]

Actually it doesn't quite treat a NUL exactly like a newline; it throws 
data away without any warning; see below.

>>> import csv
>>> guff = ["aaa\0bbb", "x\0\0y"]
>>> [x for x in csv.reader(guff)]
[['aaa'], ['x']]
>>> guff2 = ["aaa\nbbb", "x\n\ny"]
>>> [x for x in csv.reader(guff2)]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
_csv.Error: newline inside string
>>>
 


More information about the Csv mailing list