Odd csv column-name truncation with only one column

Hans Mulder hansmu at xs4all.nl
Fri Jul 20 12:59:24 EDT 2012


On 19/07/12 23:10:04, Dennis Lee Bieber wrote:
> On Thu, 19 Jul 2012 13:01:37 -0500, Tim Chase
> <python.list at tim.thechases.com> declaimed the following in
> gmane.comp.python.general:
> 
>>  It just seems unfortunate that the sniffer would ever consider
>> [a-zA-Z0-9] as a valid delimiter.

+1

> 	I'd suspect the sniffer logic does not do any special casing
> -- any /byte value/ is a candidate for the delimiter.

The sniffer prefers [',', '\t', ';', ' ', ':'] (in that order).
If none of those is found, it goes to the other extreme and considers
all characters equally likely.

> This would allow for usage of some old ASCII control characters --
> things like  x1F (unit separator)

If the Sniffer excludes [a-zA-Z0-9] (or all alphanumerics) as
potential delimiters, than control characters such as "\x1F" are
still possible.

> {Next is to rig the sniffer to identify x1F for fields, and x1E
> for records <G>}

The sniffer will always guess '\r\n' as the line terminator.

That should not stop you from creating a dialect with '\x1E' as
the line terminator.  Just don't expect the sniffer to recognize
that dialect.

-- HansM





More information about the Python-list mailing list