PEP 305 - CSV File API
Carlos Ribeiro
cribeiro at mail.inet.com.br
Mon Feb 3 20:46:27 EST 2003
Skip and John,
skip> I will note that the csv module under development makes *no*
skip> attempts at any kind of data conversion when reading CSV
skip> files. Even ints and floats are returned as strings.
john> So the functionality for handling different data formats for
john> dates (and any other data-types) -- in fact any functionality
john> that knows or even suspects what the data type might be
john> -- should be factored out into another layer.
Implemented this way, the CSV library is not nearly as useful as it could be.
In the end, the library that is located at 'upper layer' will end up being
regarded as the 'real' CSV library. Let me point out a very simple situation
that happens literally daily to me in my 'real life'.
I run a small food processing company, and I do some data analysis daily using
Excel. Most of the data that I analyze is read from a database (using ADO)
and pre-processed using a bunch of Python scripts; these scripts just export
CSV files that are read into Excel later [1][2].
The problem is, almost all my intermediate files have both 'date' and 'float'
columns. This is highly common in business, specially if you are looking at
sales figures and stuff like that.
To compound my problem, Python writes floats with a period (.) as a decimal
separator. However, my copy of Excel is configured for the brazilian locale,
and it expects a comma (,) as the decimal separator.
Now for the real issue. If I convert my floats to strings *before* writing the
CSV file, It will end up quoted (for example, '3,1416') - assuming that the
CSV library will work as Skip said. This is not what I would expect, and in
fact, it's not what anyone working with different locale settings would say.
Last, even if Python just wrote floats with the 'right' decimal separator -
comma, in my case - there still would be other software packages that would
expect to get periods. Or worse, I could try to send my data files to people
in other countries that would be unable to read it. In any event, there is no
automatic solution, but the ability to quickly adjust the CSV library to get
the correct behavior would be highly useful.
Carlos Ribeiro
---
[1] I know I could control Excel with COM or even ADO, but writing CSV files
is simple; also, the intermediate files are useful for both debugging and
backup purposes.
[2] Better still, some people may ask me why I'm using Excel, and not doing
everything in Pure Python. <sigh>. No comments.
More information about the Python-list
mailing list