[issue2078] CSV Sniffer does not function properly on single column .csv files
Skip Montanaro
report at bugs.python.org
Fri Mar 28 13:28:46 CET 2008
Skip Montanaro <skip at pobox.com> added the comment:
Jean-Philippe> You're right, it does seem that using f.read(1024) to
Jean-Philippe> feed the sniffer works OK in my case and allows me to
Jean-Philippe> instantiate the DictReader correctly... Why that is I'm
Jean-Philippe> not sure though...
It works entirely based on chracter frequencies. The more characters you
feed it the better it should be at guessing the correct delimiter. In
particular, it pays attention to the frequency of the possible delimiters
per line and assumes the number of columns is the same for each line.
(Well, there's one place where it does use some knowledge of the structure
of a csv file, so my earlier assertion was incorrect.) If you only feed it
one line it can't really use that frequency-per-line information.
Jean-Philippe> I was submitting the first line as I thought is was the
Jean-Philippe> right sample to provide the sniffer for it to sniff the
Jean-Philippe> correct dialect regardless of the file format and file
Jean-Philippe> content.
That's a good guess, but not quite spot on in this case. In particular, the
character frequencies in the first line tend to be much different than the
other lines because it usually a row of column headers, while the remainder
of the file (though not always ;-) is a table of numbers.
Skip
__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue2078>
__________________________________
More information about the Python-bugs-list
mailing list