Module for reading CSV data

Cliff Wells logiplexsoftware at earthlink.net
Mon Nov 12 14:25:12 EST 2001


On Saturday 10 November 2001 01:03, Ian Parker wrote:

> The ASV module by Laurence Tratt handles csv files very well.  IIRC it
> doesn't fall apart on quoted strings containing commas.

As long as everyone else is plugging CSV modules, I may as well point you to 
a module I wrote a few months ago for importing CSV files:

https://sourceforge.net/projects/python-dsv/

It is poorly documented and full of re.ugliness, but it does some things that 
I haven't seen in any other csv importer:

- Not limited to using commas as delimiters
- It can guess the delimiter 
- It can guess the text qualifier (single or double quotes)
- It can guess whether the first row is a header
- It handles quoted delimiters
- It handles quoted newlines
- It can handle inconsistent quoting (Excel, for instance, only quotes data 
that requires it, i.e. data containing delimiters or newlines, whereas some 
other programs quote everything).
- It has an optional dialog (using wxPython) for previewing the data prior to 
import (ala MS Excel) and allowing the user to change the guessed parameters.
- It's reasonably fast, considering the amount of data analysis it does.  The 
heuristics analyze the smallest portion of the file they can get away with, 
so increasing the file size won't usually increase the time spent in the 
guessing steps (although it will obviously affect the overall time to import).

The guessing steps seem to be reliable, but can be skipped and set 
programmatically.

Caveats:
- I had some problems when Python 2.0 first came out with the sre module that 
somehow broke my regular expressions.  I believe this was fixed, but can't 
recall what the condition was that caused the error, so can't be sure.  My 
recent tests seem to indicate everything is working properly.
- The code for the wxDialog is ugly (I was planning on creating a wizard-like 
series of dialogs, but never got around to it).  What is there is usable 
though.
- The code for the guessing heuristics is poorly documented and fairly dense.

On the other hand, this code was used in a production environment without a 
hitch.

Regards,

-- 
Cliff Wells
Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308
(800) 735-0555 x308




More information about the Python-list mailing list