PEP 305 - CSV File API

Skip Montanaro skip at pobox.com
Mon Feb 3 23:00:20 EST 2003


    Carlos> The problem is, almost all my intermediate files have both
    Carlos> 'date' and 'float' columns. This is highly common in business,
    Carlos> specially if you are looking at sales figures and stuff like
    Carlos> that.

    Carlos> To compound my problem, Python writes floats with a period (.)
    Carlos> as a decimal separator. However, my copy of Excel is configured
    Carlos> for the brazilian locale, and it expects a comma (,) as the
    Carlos> decimal separator.

Can't you simply set the locale in your scripts so Python and Excel agree?

    Carlos> Now for the real issue. If I convert my floats to strings
    Carlos> *before* writing the CSV file, It will end up quoted (for
    Carlos> example, '3,1416') - assuming that the CSV library will work as
    Carlos> Skip said. This is not what I would expect, and in fact, it's
    Carlos> not what anyone working with different locale settings would
    Carlos> say.

It would only be quoted if you had comma as the delimiter or had set the
quoting parameter to QUOTE_ALWAYS.  What delimiter do you use in your CSV
files? 

    Carlos> Last, even if Python just wrote floats with the 'right' decimal
    Carlos> separator - comma, in my case - there still would be other
    Carlos> software packages that would expect to get periods. 

How would you like us to handle this?  Sound like a case of being "damned if
we do, damned if we don't".

    Carlos> Or worse, I could try to send my data files to people in other
    Carlos> countries that would be unable to read it. In any event, there
    Carlos> is no automatic solution, but the ability to quickly adjust the
    Carlos> CSV library to get the correct behavior would be highly useful.

We have to come back to the fundamental issue that CSV files as commonly
understood contain no data type information.  It's possible that type
information could be passed in during write operations which would govern
the way the data is formatted when written.  (We've discussed it, but it's
not likely to be in the first release.)

Even if we solve the formatting issue, once the data is written out to the
file, if you ship it out of your locale, no information remains in the file
to indicate that 3,1416 is a number instead of a string containing digits
and a comma.  Similarly, if you choose to write dates out in an ambiguous
format, at the receiving end, the reader won't be able to tell what date
"02/03/03" represents.

Skip





More information about the Python-list mailing list