Alternatives for the CSV module

- madsurfer2000 at hotmail.com
Sun Sep 12 09:21:18 EDT 2004


I am going to make a program that reads files with different
csv-dialects. Sometimes the field-separator or line-separator can be
more than one character. The standard CSV module in Python 2.3 is not
a good solution, because it expects single characters.

Example of a file

"ABC"<>"DEF"""<>"GHI"¤¤123<>456<>"XYZ"¤¤

Here the field delimiter is "<>" and the "line" terminator "¤¤".
Fields can be enclosed in quotes, and a double qoute is treated as
normal text.

This is not the only format the parser can expect. The format is given
to the program by the user, so the program should have no problems
parsing the text. An ideal solution would be a similar parser to the
standard CSV-parser, except that it accepts strings as delimiters.

I could always manipulate the input file and replace the delimiters by
single characters, but I would like a more generic solution.

SimpleParse (http://simpleparse.sourceforge.net/) looks like a good
alternative. It doesn't support Unicode, but most most files can be
converted to ISO-8859-1 first.

Would SimpleParse be suitable for this purpose, or are there better
alternatives out there, like a more flexible CSV-parser?



More information about the Python-list mailing list