Newby: how to transform text into lines of text
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Sun Jan 25 23:57:55 EST 2009
En Mon, 26 Jan 2009 00:23:30 -0200, John Machin <sjmachin at lexicon.net>
escribió:
> On Jan 26, 1:03 pm, "Gabriel Genellina" <gagsl-... at yahoo.com.ar>
> wrote:
>> It's so easy that don't doing that is just inexcusable lazyness :)
>> Your own example, written using the csv module:
>>
>> import csv
>>
>> f = csv.reader(open('customer_x.txt','rb'), delimiter='\t')
>> headers = f.next()
>> for line in f:
>> field1, field2, field3 = line
>> do_stuff()
>
> And where in all of that do you recommend that .decode(some_encoding)
> be inserted?
For encodings that don't use embedded NUL bytes (latin1, utf8) I'd decode
the fields right when extracting them:
field1, field2, field3 = (field.decode('utf8') for field in line)
For encodings that allow NUL bytes, I'd use any of the recipes in the csv
module documentation.
(That is, if I care about the encoding at all. Perhaps the file contains
only numbers. Perhaps it contains only ASCII characters. Perhaps I'm only
interested in some fields for which the encoding is irrelevant. Perhaps it
is an internally generated file and it doesn't matter as long as I use the
same encoding on output)
But I admit that in general, the "decode input early when reading, work in
unicode, encode output late when writing" is the best practice.
--
Gabriel Genellina
More information about the Python-list
mailing list