csv.Sniffer - delete in Python 3.0?
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Wed Mar 19 19:19:32 EDT 2008
En Wed, 19 Mar 2008 12:44:05 -0300, <skip at pobox.com> escribió:
> The csv module contains a Sniffer class which is supposed to deduce the
> delimiter and quote character as well as the presence or absence of a
> header
> in a sample taken from the start of a purported CSV file. I no longer
> remember who wrote it, and I've never been a big fan of it. It
> determines
> the delimiter based almost solely on character frequencies. It doesn't
> consider what the actual structure of a CSV file is or that delimiters
> and
> quote characters are almost always taken from the set of punctuation or
> whitespace characters. Consequently, it can cause some occasional
> head-scratching:
>
> >>> sample = """\
> ... abc8def
> ... def8ghi
> ... ghi8jkl
> ... """
> >>> import csv
> >>> d = csv.Sniffer().sniff(sample)
> >>> d.delimiter
> '8'
> >>> sample = """\
> ... a8bcdef
> ... ab8cdef
> ... abc8def
> ... abcd8ef
> ... """
> >>> d = csv.Sniffer().sniff(sample)
> >>> d.delimiter
> 'f'
>
> It's not clear to me that people use letters or digits very often as
> delimiters. Both samples above probably represent data from
> single-column
> files, not double-column files with '8' or 'f' as the delimiter.
I've seen an 'X' used as field separator - but in that case all values
were numbers only.
> I would be happy to get rid of it in 3.0, but I'm also aware that some
> people use it. I'd like feedback from the Python community about this.
> If
> I removed it is there someone out there who wants it badly enough to
> maintain it in PyPI?
The Sniffer class already has a "delimiters" parameter; passing
string.punctuation seems reasonable in case one wants to restrict the
possible delimiter set. I think Sniffer is an useful class - but can't do
magic, perhaps a few lines in the docs stating its limitations would be
fine.
--
Gabriel Genellina
More information about the Python-list
mailing list