Newby: how to transform text into lines of text

Sun Jan 25 23:57:55 EST 2009

En Mon, 26 Jan 2009 00:23:30 -0200, John Machin <sjmachin at lexicon.net>  
escribió:
> On Jan 26, 1:03 pm, "Gabriel Genellina" <gagsl-... at yahoo.com.ar>
> wrote:

>> It's so easy that don't doing that is just inexcusable lazyness :)
>> Your own example, written using the csv module:
>>
>> import csv
>>
>> f = csv.reader(open('customer_x.txt','rb'), delimiter='\t')
>> headers = f.next()
>> for line in f:
>>      field1, field2, field3 = line
>>      do_stuff()
>
> And where in all of that do you recommend that .decode(some_encoding)
> be inserted?

For encodings that don't use embedded NUL bytes (latin1, utf8) I'd decode  
the fields right when extracting them:

     field1, field2, field3 = (field.decode('utf8') for field in line)

For encodings that allow NUL bytes, I'd use any of the recipes in the csv  
module documentation.

(That is, if I care about the encoding at all. Perhaps the file contains  
only numbers. Perhaps it contains only ASCII characters. Perhaps I'm only  
interested in some fields for which the encoding is irrelevant. Perhaps it  
is an internally generated file and it doesn't matter as long as I use the  
same encoding on output)
But I admit that in general, the "decode input early when reading, work in  
unicode, encode output late when writing" is the best practice.

-- 
Gabriel Genellina