csv.Sniffer - delete in Python 3.0?

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Wed Mar 19 19:19:32 EDT 2008


En Wed, 19 Mar 2008 12:44:05 -0300, <skip at pobox.com> escribió:

> The csv module contains a Sniffer class which is supposed to deduce the
> delimiter and quote character as well as the presence or absence of a  
> header
> in a sample taken from the start of a purported CSV file.  I no longer
> remember who wrote it, and I've never been a big fan of it.  It  
> determines
> the delimiter based almost solely on character frequencies.  It doesn't
> consider what the actual structure of a CSV file is or that delimiters  
> and
> quote characters are almost always taken from the set of punctuation or
> whitespace characters.  Consequently, it can cause some occasional
> head-scratching:
>
>     >>> sample = """\
>     ... abc8def
>     ... def8ghi
>     ... ghi8jkl
>     ... """
>     >>> import csv
>     >>> d = csv.Sniffer().sniff(sample)
>     >>> d.delimiter
>     '8'
>     >>> sample = """\
>     ... a8bcdef
>     ... ab8cdef
>     ... abc8def
>     ... abcd8ef
>     ... """
>     >>> d = csv.Sniffer().sniff(sample)
>     >>> d.delimiter
>     'f'
>
> It's not clear to me that people use letters or digits very often as
> delimiters.  Both samples above probably represent data from  
> single-column
> files, not double-column files with '8' or 'f' as the delimiter.

I've seen an 'X' used as field separator - but in that case all values  
were numbers only.

> I would be happy to get rid of it in 3.0, but I'm also aware that some
> people use it.  I'd like feedback from the Python community about this.   
> If
> I removed it is there someone out there who wants it badly enough to
> maintain it in PyPI?

The Sniffer class already has a "delimiters" parameter; passing  
string.punctuation seems reasonable in case one wants to restrict the  
possible delimiter set. I think Sniffer is an useful class - but can't do  
magic, perhaps a few lines in the docs stating its limitations would be  
fine.

-- 
Gabriel Genellina




More information about the Python-list mailing list