[PEP305] Python 2.3: a small change request in CSV module

Thu May 15 13:34:30 EDT 2003

I may be a bit late to the ball with the beta already out, but I'd like
to request a little change/addition to the otherwise very neat new CSV
module. The field separator m$excel uses depends on the user locale
(windows control panel, regional settings, list separator). I for one
very often see either a comma (the default for the csv module) or a
semi-colon being used.

The CSV module only allows a single character as delimiter, which does
not (easily) allow one to write generic code that would not be at the
mercy of whatever the current locale is of the user who sends you a csv
file. Fortunately the Sniffer class is provided for guessing the most
likely delimiter, and seems to work fine, from my limited tests.

There's an error in the documentation of Sniffer().sniff(x), though:
its x argument is documented as a file object, whereas the code actually
expects a sample buffer. Once you feed it appropriately, this works fine
and deals nicely with the above mentioned problem of choosing the right
delimiter. I feel though, that this unfortunately forces one to write
more code than is really needed, typically in the following form:

    sample = file( 'data.csv' ).read( 8192 )
    dialect = csv.Sniffer().sniff( sample )
    infile = file( 'data.csv' )
    for fields in csv.reader( infile, dialect ):
        # do something with fields

That's a tad ugly, having to open the same file twice in particular.
What I would like to see instead is either:
(1)
    for fields in csv.reader( infile, dialect='excel', delimiter=',|;' ):
        # do something with fields

*or* probably more realistically:
(2)
    for fields in csv.reader( infile, dialect='sniff' ):
        # do something with fields

I guess allowing multi-character or regular expressions as delimiters
would be too much of a change, especially since the real data splitting
seems to occur in a C module. But solution (2) is very easy to implement
in plain python, and just needs to use a Sniffer to guess the correct
Dialect instead of forcing the user to "hard choose" one.

Sorry for the longish explanation for a fairly simple change request,
really. If this is not the appropriate place for posting, please let
me know. Thanks for reading this far; if you've looked at python 2.3
you'll agree that it looks like another very promising piece of
Dutch technology ;-)

Cheers,

Bernard.