[PEP305] Python 2.3: a small change request in CSV module

Skip Montanaro skip at pobox.com
Thu May 15 16:15:19 EDT 2003


I'm replying on c.l.py, but note that for future reference this thread
belongs on csv at mail.mojam.com (on the cc: list).

    Bernard> The CSV module only allows a single character as delimiter,
    Bernard> which does not (easily) allow one to write generic code that
    Bernard> would not be at the mercy of whatever the current locale is of
    Bernard> the user who sends you a csv file. Fortunately the Sniffer
    Bernard> class is provided for guessing the most likely delimiter, and
    Bernard> seems to work fine, from my limited tests.

I'll leave Dave and Andrew to comment on the possibility of admitting a
multiple-character delimiter string, as that will affect their C code.

    Bernard> There's an error in the documentation of Sniffer().sniff(x),
    Bernard> though: its x argument is documented as a file object, whereas
    Bernard> the code actually expects a sample buffer. 

Thanks, I'll fix the docs.  They didn't quite catch up to the last-minute
changes I made to the code.

    Bernard> I feel though, that this unfortunately forces one to write more
    Bernard> code than is really needed, typically in the following form:

    Bernard>     sample = file( 'data.csv' ).read( 8192 )
    Bernard>     dialect = csv.Sniffer().sniff( sample )
    Bernard>     infile = file( 'data.csv' )
    Bernard>     for fields in csv.reader( infile, dialect ):
    Bernard>         # do something with fields

    Bernard> That's a tad ugly, having to open the same file twice in
    Bernard> particular.

I recognize the issue you raise.  As originally written, the Sniffer class
also took a file-like object, however, it relied on being able to rewind the
stream.  This would, for example, prevent you from feeding sys.stdin to the
sniffer.  I also felt the decision of rewinding the stream belonged with the
caller.  I decided to change it to accepting a small data sample instead.
You can avoid multiple opens by rewinding the stream yourself (in the common
case where the stream can be rewound):

    infile = file('data.csv')
    sample = infile.read(8192)
    infile.seek(0)
    dialect = csv.Sniffer().sniff( sample )
    for fields in csv.reader( infile, dialect ):
        # do something with fields

Note that after the sniffer does its thing you should check that it returned
reasonable values.

    Bernard> (2)
    Bernard>     for fields in csv.reader( infile, dialect='sniff' ):
    Bernard>         # do something with fields

Do you mean to imply that the csv.reader object should call the sniffer
implicitly and use the values it returns?  That's an interesting idea but
the sniffer isn't guaranteed to always guess right.

Skip





More information about the Python-list mailing list