csv and mixed lists of unicode and numbers

Peter Otten __peter__ at web.de
Tue Nov 24 14:04:42 EST 2009


Sibylle Koczian wrote:

> I want to put data from a database into a tab separated text file. This
> looks like a typical application for the csv module, but there is a
> snag: the rows I get from the database module (kinterbasdb in this case)
> contain unicode objects and numbers. And of course the unicode objects
> contain lots of non-ascii characters.
> 
> If I try to use csv.writer as is, I get UnicodeEncodeErrors. If I use
> the UnicodeWriter from the module documentation, I get TypeErrors with
> the numbers. (I'm using Python 2.6 - upgrading to 3.1 on this machine
> would cause other complications.)
> 
> So do I have to process the rows myself and treat numbers and text
> fields differently? Or what's the best way?

I'd preprocess the rows as I tend to prefer the simplest approach I can come 
up with. Example:

def recode_rows(rows, source_encoding, target_encoding):
    def recode(field):
        if isinstance(field, unicode):
            return field.encode(target_encoding)
        elif isinstance(field, str):
            return unicode(field, source_encoding).encode(target_encoding)
        return unicode(field).encode(target_encoding)

    return (map(recode, row) for row in rows)

rows = [[1.23], [u"äöü"], [u"ÄÖÜ".encode("latin1")], [1, 2, 3]]
writer = csv.writer(sys.stdout)
writer.writerows(recode_rows(rows, "latin1", "utf-8"))

The only limitation I can see: target_encoding probably has to be a superset 
of ASCII.

Peter




More information about the Python-list mailing list