[Csv] PEP 305

Fri Aug 22 02:58:49 CEST 2003

>In studying the new CSV module, I find two problems, particularly in
>interpreting csv files used for database import/export. Currently we use
>our own csv parsing/writing utility, but would like to use the language
>supported facility if possible.
>
>1. When reading a field with adjacent delimiters (an empty field), your
>code always maps that to an empty string. When interpreting DB output (at
>least for DB2), an empty string is a pair of quotes. An empty field
>represents NULL in the database and we parse that as the Python object
>None (same result as from an SQL query). Using the csv module as is, an
>empty string and None export identically. If this behavior were encoded
>into the dialect, we could easily modify this behavior to suit our needs.
>
>2. The other problem for my application, is the differentiation between
>numeric data and strings of numbers in the csv file (this again is related
>to DB2 import/export files). Our needs are to map anything with quotes in
>the csv to a string (even if it is numeric). Anything without quotes
>should map to a Python numeric type (or, as mentioned above, None when
>adjacent delimiters appear). Of course, this would imply the possibility
>of a ValueError when reading a csv. Again, it seems this behavior could be
>parameterized out into the dialect.
>
>Possibly both items could be addressed by a map_to_python_object
>parameter.

You raise valid points, and it's something we argued over for some time
when preparing the module for Python 2.3. I tend to agree that a switch
of some sort should enable this behaviour, but I suspect it will need
to be at least partially implemented in the underlying C parser (which
makes it a little less trivial).

As you note, there are two separate problems here - the first is that it
is impossible to distinguish between an empty field and an empty string:
this will need changes to the C parser. The second is that of typing the
results: I'm not convinced this belongs in the csv module - the database
user probably has a better idea of the required types than the csv module
could ever have. A layer on top of the csv parser that takes hints from
the database and casts columns to the appropriate type would be the best
option - possibly a list of type converters would be passed in.

-- 
Andrew McNamara, Senior Developer, Object Craft
http://www.object-craft.com.au/