[issue2078] CSV Sniffer does not function properly on single column .csv files

Skip Montanaro report at bugs.python.org
Fri Mar 28 13:28:46 CET 2008


Skip Montanaro <skip at pobox.com> added the comment:

Jean-Philippe> You're right, it does seem that using f.read(1024) to
    Jean-Philippe> feed the sniffer works OK in my case and allows me to
    Jean-Philippe> instantiate the DictReader correctly...  Why that is I'm
    Jean-Philippe> not sure though...

It works entirely based on chracter frequencies.  The more characters you
feed it the better it should be at guessing the correct delimiter.  In
particular, it pays attention to the frequency of the possible delimiters
per line and assumes the number of columns is the same for each line.
(Well, there's one place where it does use some knowledge of the structure
of a csv file, so my earlier assertion was incorrect.)  If you only feed it
one line it can't really use that frequency-per-line information.

    Jean-Philippe> I was submitting the first line as I thought is was the
    Jean-Philippe> right sample to provide the sniffer for it to sniff the
    Jean-Philippe> correct dialect regardless of the file format and file
    Jean-Philippe> content.

That's a good guess, but not quite spot on in this case.  In particular, the
character frequencies in the first line tend to be much different than the
other lines because it usually a row of column headers, while the remainder
of the file (though not always ;-) is a table of numbers.

Skip

__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue2078>
__________________________________


More information about the Python-bugs-list mailing list