Odd csv column-name truncation with only one column

Hans Mulder hansmu at xs4all.nl
Thu Jul 19 09:52:12 EDT 2012


On 19/07/12 13:21:58, Tim Chase wrote:
> tim at laptop:~/tmp$ python
> Python 2.6.6 (r266:84292, Dec 26 2010, 22:31:48)
> [GCC 4.4.5] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import csv
>>>> from cStringIO import StringIO
>>>> s = StringIO('Email\nfoo at example.com\nbar at example.org\n')
>>>> s.seek(0)
>>>> d = csv.Sniffer().sniff(s.read())
>>>> s.seek(0)
>>>> r = csv.DictReader(s, dialect=d)
>>>> r.fieldnames
> ['Emai', '']
> 
> I get the same results using Python 3.1.3 (also readily available on
> Debian Stable), as well as working directly on a file rather than a
> StringIO.
> 
> Any reason I'm getting ['Emai', ''] (note the missing ell) instead
> of ['Email'] as my resulting fieldnames?  Did I miss something in
> the docs?

The sniffer tries to guess the column separator.  If none of the
usual suspects seems to work, it tries to find a character that
occurs with the same frequency in every row.  In your sample,
the letter 'l' occurs exactly once on each line, so it is the
most plausible separator, or so the Sniffer thinks.

Perhaps it should be documented that the Sniffer doesn't work
on single-column data.

If you really need to read a one-column csv file, you'll have
to find some other way to produce a Dialect object.  Perhaps the
predefined 'cvs.excel' dialect matches your data.  If not, the
easiest way might be to manually define a csv.Dialect subclass.

Hope this helps,

-- HansM




More information about the Python-list mailing list