Problems with csv module

John Machin sjmachin at lexicon.net
Wed May 11 20:23:44 EDT 2005


On Wed, 11 May 2005 14:08:08 -0500, Skip Montanaro <skip at pobox.com>
wrote:

>
>    >> Based on the requests I've seen here and on the csv at mojam.com mailing
>    >> list, it appears people are certainly generating CSV files which
>    >> contain Unicode- encoded data.
>
>    Fredrik> in what encodings?
>
>I've seen hints about iso-8859-1/iso-8859-15 and mention that Excel 2000
>supports utf-8.

I have Excel 2002 and have done some experimentation. It "supports"
utf-8 only to the extent that most times it doesn't mangle the data
(i.e. you can save it again without loss); you just can't make any
sense out of what's on the screen. Specifically:

open a file with CSV extension: Excel assumes blindly that it's
encoded according to your locale (e.g. cp1252).

open a file with TXT extension: Excel gives you the option of
specifying which one of a large number of *legacy* encodings -- yes,
that's correct,  utf-* are not on the list!

NOTE: the above applies even if you have a utf-8-encoded BOM at the
start of the file.

This behaviour appears to be Excel-specific; MS Word, Wordpad and even
the humble Notepad recognise the utf-8-encoded BOM and display
sensibly (with a Unicode font, of course).


>  Whether Excel can dump csv files in utf-8 or not, I don't
>know, though I'd suppose so.

Unfortunately, your supposition is incorrect. There is no way of
specifying the encoding directly. The nearest available options are:

(1) csv : encoded in your locale-specific legacy encoding. "illegal"
characters are silently replaced by "?" on Windows and (I deduce)
underscore on a Macintosh.
(2) text : ditto
(3) Unicode text: utf-16 -- it *does* subsequently open these
correctly i.e. silently detects the encoding and displays properly.







More information about the Python-list mailing list