[issue24787] csv.Sniffer guesses "M" instead of \t or , as the delimiter

Peter Otten report at bugs.python.org
Sat Aug 8 09:49:08 CEST 2015


Peter Otten added the comment:

Have you considered writing your own little sniffer? Getting it right for your actual data is usually easier to achieve than a general solution.

The following simplistic sniffer should work with your samples:

def make_dialect(delimiter):
    class Dialect(csv.excel):
        pass
    Dialect.delimiter = delimiter
    return Dialect

def sniff(sample):
    count, delimiter = max(
        ((sample.count(delim), delim) for delim in ",\t|;"),
        key=operator.itemgetter(0))
    if count == 0:
        if " " in sample:
            delimiter = " "
        else:
            raise csv.Error("Could not determine delimiter")
    return make_dialect(delimiter)

Tiago, If you want to follow that path we should take the discussion to the general python mailing list.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue24787>
_______________________________________


More information about the Python-bugs-list mailing list