Odd csv column-name truncation with only one column

Peter Otten __peter__ at web.de
Thu Jul 19 07:49:53 EDT 2012


Tim Chase wrote:

> tim at laptop:~/tmp$ python
> Python 2.6.6 (r266:84292, Dec 26 2010, 22:31:48)
> [GCC 4.4.5] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import csv
>>>> from cStringIO import StringIO
>>>> s = StringIO('Email\nfoo at example.com\nbar at example.org\n')
>>>> s.seek(0)
>>>> d = csv.Sniffer().sniff(s.read())
>>>> s.seek(0)
>>>> r = csv.DictReader(s, dialect=d)
>>>> r.fieldnames
> ['Emai', '']
> 
> I get the same results using Python 3.1.3 (also readily available on
> Debian Stable), as well as working directly on a file rather than a
> StringIO.
> 
> Any reason I'm getting ['Emai', ''] (note the missing ell) instead
> of ['Email'] as my resulting fieldnames?  Did I miss something in
> the docs?

Judging from 

>>> import csv
>>> sniffer = csv.Sniffer()
>>> sniffer.sniff("abc").delimiter
'c'
>>> sniffer.sniff("abc\naba").delimiter
'b'
>>> sniffer.sniff("abc\naba\nxyz").delimiter
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/csv.py", line 184, in sniff
    raise Error, "Could not determine delimiter"
_csv.Error: Could not determine delimiter
>>> sniffer.sniff("abc\n"*10 + "xyz").delimiter
'c'
>>> sniffer.sniff("abc\n"*9 + "xyz").delimiter
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/csv.py", line 184, in sniff
    raise Error, "Could not determine delimiter"
_csv.Error: Could not determine delimiter

the Sniffer heuristics determines a character that occurs on all of the 
first 10 lines to be the delimiter. There are of course examples where that 
doesn't make sense to a human observer...





More information about the Python-list mailing list