for loop question

Wed Oct 10 17:17:04 EDT 2007

On Wed, 2007-10-10 at 16:03 -0500, Robert Dailey wrote:
> I've tried everything to make the original CSV module work. It just
> doesn't. I've tried UTF-16 encoding

What do you mean, "tried?" Don't you know what the file is encoded in?

>  (which works fine with codecs.open()) but when I pass in the file
> object returned from codecs.open() into csv.reader(), the call to
> reader.next() fails because it says something isnt' in the range of
> range(128) or something (Not really an expert on Unicode so I'm not
> sure of the meaning). I would use CSV if I could!

That's because the codec-file object feeds it decoded Unicode strings,
but the CSV module wants to work with encoded octet strings, so it tries
to encode the unicode string with the default codec. The default codec
is ASCII, which can't represent characters with code points greater than
127.

Instead of passing the file object directly to the csv parser, pass in a
generator that reads from the file and explicitly encodes the strings
into UTF-8, along these lines:

def encode_to_utf8(f):
    for line in f:
        yield line.encode("utf-8")

There may be a fundamental problem with this approach that I can't
foresee at the moment, but it's worth a try when your alternative is to
build a Unicode-aware CSV parser from scratch.

Hope this helps,

-- 
Carsten Haese
http://informixdb.sourceforge.net