[Csv] Sniffer empty delimiter

John Machin sjmachin at lexicon.net
Thu Dec 29 08:25:28 CET 2005


skip at pobox.com wrote:
>     Python 2.4.2 (#67, Sep 28 2005, 12:41:11) [MSC v.1310 32 bit (Intel)] on win32
>     Type "help", "copyright", "credits" or "license" for more information.
>     >>> import csv
>     >>> d = csv.Sniffer().sniff('a|b|c|d|e', ['\t', ','])
>     >>> d.delimiter
>     ''
>     >>> d = csv.Sniffer().sniff('a|b|c|d|e')
>     >>> d.delimiter
>     'a'
> 
> Both of these seem wrong to me at some level.  I tend to agree with you that
> if the delimiter fails it should raise an exception, certainly if the
> delimiters argument defines a set of characters from which the actual
> delimiter must be chosen (does it?). 

I've got no idea what the delimiters argument is for. That's why I 
suggested it be documented. Contrary to your recollection, I am *not* 
the author of any part of the csv module.


> The second has to be considered a bug
> doesn't it?

Yes. I regard the notion of an alphanumeric character being a delimiter 
as utterly preposterous.


> 
>     John> (1) IMHO it should *NEVER* return an alphabetic or numeric
>     John>     character as the delimiter.
> 
> Probably a good rule of thumb.
> 
>     John> (2) If there is insufficient sample to determine the dialect's
>     John>     attributes, then it shouldn't pluck them out of the air, with
>     John>     no indication to the caller that there might be a problem. IOW
>     John>     I don't like the "remedies" of "return standard delimiter" and
>     John>     "return first delimiter". It should raise csv.Error; the
>     John>     discerning caller can then take appropriate action.
> 
> If I have a csv file that happens to only have one column and I'm using the
> sniffer (presumably because I have an app that processes somewhat arbitrary
> csv files) I'd hate for it to fail in that one case.  For that case maybe we
> can define an optional default arg that is a single character.  Failing all
> other tests, the default is returned.

Optional default arg *plus* an exception? Holy redundancy, Batman!

Caller can do this:

try:
     d = csv.Sniffer().sniff(sample)
except csv.Error:
     d = my_default_dialect

> 
>     John> (3) Some documentation on how the 2nd arg is used would be a good
>     John>     idea, as would be an explanation of the relationship with the
>     John>     undocumented "preferred" attribute:
> 
> Agreed.  I seem to recall you're the author.  Got some text? <wink>

Not so. In fact I'd not even used the sniffer before today.

> 
>     >>> csv.Sniffer().preferred
>     [',', '\t', ';', ' ', ':']
> 
>     John> (4) Too late to change now, but having a class with no args to its
>     John>     constructor and only one other method has a whiff of some
>     John>     other language :-)
> 
> It's not too late to add an optional preferred arg to the constructor.


Maybe it's even not too late get some feedback from the actual users and 
to spec out the sniffer a bit more rigorously and then ensure it meets 
that spec.

Cheers,
John



More information about the Csv mailing list