PEP 305 - CSV File API

Ian Bicking ianb at colorstudy.com
Tue Feb 4 03:20:39 EST 2003


On Monday, February 3, 2003, at 07:46 PM, Carlos Ribeiro wrote:
> john> So the functionality for handling different data formats for
> john> dates (and any other data-types) -- in fact any functionality
> john> that knows or even suspects what the data type might be
> john> -- should be factored out into another layer.
>
> Implemented this way, the CSV library is not nearly as useful as it 
> could be.
> In the end, the library that is located at 'upper layer' will end up 
> being
> regarded as the 'real' CSV library. Let me point out a very simple 
> situation
> that happens literally daily to me in my 'real life'.

I think the requirements you lay out go way too far for this module.  
You want it to magically parse your CSV files, and it cannot do that.

It *can* provide an abstraction layer so that you can implement a 
wrapper that handles *your* situation.  You might even be able to 
generalize that wrapper sufficiently that it could deal with a number 
of issues of internationalization and type inference -- but you would 
only achieve that after many iterations.  CSV parsing is a Solved 
Problem -- the only problem is that it isn't solved canonically.  
Interpreting CSV files is not a Solved Problem, and it never will be.

> To compound my problem, Python writes floats with a period (.) as a 
> decimal
> separator. However, my copy of Excel is configured for the brazilian 
> locale,
> and it expects a comma (,) as the decimal separator.
>
> Now for the real issue. If I convert my floats to strings *before* 
> writing the
> CSV file, It will end up quoted (for example, '3,1416') - assuming 
> that the
> CSV library will work as Skip said. This is not what I would expect, 
> and in
> fact, it's not what anyone working with different locale settings 
> would say.

Yes it is.  CSV stands for "comma separated values" -- you must quote 
values with commas in them or it simply *will not work*.  There is no 
alternative.  There is no alternative.

You need to provide tested assertions about how packages emit and 
handle CSV files.  Will Excel handle a CSV file with floats using "." 
when that doesn't match your locale?  That would be the best solution 
-- localization shouldn't occur in an intermediate file like a CSV 
file.  If Excel does not handle this, then what *does* it handle?  
Certainly it must either quote its output or use "." instead of "," for 
floats.  Does Excel even care whether a value is quoted in its input?  
What software does?

As far as dates -- I would recommend trying something like ISO dates 
(YYYY-MM-DD, I believe) which are unambiguous and require no 
internationalization.  If they don't work, then you're simply going to 
have to parse it yourself.  Parsing the results for Excel will probably 
be different then the results for other CSV-emitting software, which 
may emit dates in numerous other ways.

If you make a useful abstraction layer ontop of the CSV parser, then 
publish it.  It will help others out.  But you shouldn't confuse 
parsing Brazilian Excel output with CSV parsing in general.

   Ian






More information about the Python-list mailing list