csv and mixed lists of unicode and numbers

Tue Nov 24 16:01:55 EST 2009

Peter Otten schrieb:
> I'd preprocess the rows as I tend to prefer the simplest approach I can come 
> up with. Example:
> 
> def recode_rows(rows, source_encoding, target_encoding):
>     def recode(field):
>         if isinstance(field, unicode):
>             return field.encode(target_encoding)
>         elif isinstance(field, str):
>             return unicode(field, source_encoding).encode(target_encoding)
>         return unicode(field).encode(target_encoding)
> 
>     return (map(recode, row) for row in rows)
> 

For this case isinstance really seems to be quite reasonable. And it was
silly of me not to think of sys.stdout as file object for the example!

> rows = [[1.23], [u"äöü"], [u"ÄÖÜ".encode("latin1")], [1, 2, 3]]
> writer = csv.writer(sys.stdout)
> writer.writerows(recode_rows(rows, "latin1", "utf-8"))
> 
> The only limitation I can see: target_encoding probably has to be a superset 
> of ASCII.
> 

Coping with umlauts and accents is quite enough for me.

This problem really goes away with Python 3 (tried it on another
machine), but something else changes too: in Python 2.6 the
documentation for the csv module explicitly says "If csvfile is a file
object, it must be opened with the ‘b’ flag on platforms where that
makes a difference." The documentation for Python 3.1 doesn't have this
sentence, and if I do that in Python 3.1 I get for all sorts of data,
even for a list with only one integer literal:

TypeError: must be bytes or buffer, not str

I don't really understand that.

Regards,
Sibylle