[PEP305] Python 2.3: a small change request in CSV module
Skip Montanaro
skip at pobox.com
Thu May 15 16:15:19 EDT 2003
I'm replying on c.l.py, but note that for future reference this thread
belongs on csv at mail.mojam.com (on the cc: list).
Bernard> The CSV module only allows a single character as delimiter,
Bernard> which does not (easily) allow one to write generic code that
Bernard> would not be at the mercy of whatever the current locale is of
Bernard> the user who sends you a csv file. Fortunately the Sniffer
Bernard> class is provided for guessing the most likely delimiter, and
Bernard> seems to work fine, from my limited tests.
I'll leave Dave and Andrew to comment on the possibility of admitting a
multiple-character delimiter string, as that will affect their C code.
Bernard> There's an error in the documentation of Sniffer().sniff(x),
Bernard> though: its x argument is documented as a file object, whereas
Bernard> the code actually expects a sample buffer.
Thanks, I'll fix the docs. They didn't quite catch up to the last-minute
changes I made to the code.
Bernard> I feel though, that this unfortunately forces one to write more
Bernard> code than is really needed, typically in the following form:
Bernard> sample = file( 'data.csv' ).read( 8192 )
Bernard> dialect = csv.Sniffer().sniff( sample )
Bernard> infile = file( 'data.csv' )
Bernard> for fields in csv.reader( infile, dialect ):
Bernard> # do something with fields
Bernard> That's a tad ugly, having to open the same file twice in
Bernard> particular.
I recognize the issue you raise. As originally written, the Sniffer class
also took a file-like object, however, it relied on being able to rewind the
stream. This would, for example, prevent you from feeding sys.stdin to the
sniffer. I also felt the decision of rewinding the stream belonged with the
caller. I decided to change it to accepting a small data sample instead.
You can avoid multiple opens by rewinding the stream yourself (in the common
case where the stream can be rewound):
infile = file('data.csv')
sample = infile.read(8192)
infile.seek(0)
dialect = csv.Sniffer().sniff( sample )
for fields in csv.reader( infile, dialect ):
# do something with fields
Note that after the sniffer does its thing you should check that it returned
reasonable values.
Bernard> (2)
Bernard> for fields in csv.reader( infile, dialect='sniff' ):
Bernard> # do something with fields
Do you mean to imply that the csv.reader object should call the sniffer
implicitly and use the values it returns? That's an interesting idea but
the sniffer isn't guaranteed to always guess right.
Skip
More information about the Python-list
mailing list