csv.Sniffer: wrong detection of the end of line delimiter

Laurent Laporte laurentlaporte at yahoo.com
Wed Dec 28 05:48:03 EST 2005


hello,

I'm using cvs standard module under Python 2.3 / 2.4 to read a CSV
file. The file is opened in binary mode, so I keep the end of line
terminator.

It appears that the csv.Sniffer force the line terminator to be
'\r\n'. It's fine under Windows but wrong under Linux or
Macintosh.

More about this line terminator: Potential bug in the
_guess_delimiter() method.
The first line of code does a wrong splitting:
data = filter(None, data.split('\n'))
It doesn't take care of the real line terminator!

Here is a patch (not a perfect one):
# ------- begin of patch -------
class PatchedSniffer(csv.Sniffer):

  def __init__(self):
    csv.Sniffer.__init__(self)


  def sniff(self, p_data, p_delimiters = None):
    t_dialect = csv.Sniffer.sniff(self, p_data, p_delimiters)
    t_dialect.lineterminator = self._guessLineTerminator(p_data)
    return t_dialect


  def _guessLineTerminator(self, p_data):
    for t_lineTerminator in ['\r\n', '\n', '\r']:
      if t_lineTerminator in p_data:
        return t_lineTerminator
    else:
      return '\r\n' # Windows default (Excel)


  def _formatDataForGuess(self, p_data):
    t_lineTerminator = self._guessLineTerminator(p_data)
    return '\n'.join(p_data.split(t_lineTerminator))


  def _guess_delimiter(self, p_data, p_delimiters):
    t_data = self._formatDataForGuess(p_data)

    (t_delimiter, t_skipInitialSpace) = \
      csv.Sniffer._guess_delimiter(self, t_data, p_delimiters)

    if t_delimiter == '' and '\t' in p_data:
      t_delimiter = '\t'

    return (t_delimiter, t_skipInitialSpace)
# ------- end of patch -------

Bye.
------- Laurent.




More information about the Python-list mailing list