Newby: how to transform text into lines of text

Mon Jan 26 10:35:39 EST 2009

On Sun, 2009-01-25 at 18:23 -0800, John Machin wrote:
> On Jan 26, 1:03 pm, "Gabriel Genellina" <gagsl-... at yahoo.com.ar>
> wrote:
> > En Sun, 25 Jan 2009 23:30:33 -0200, Tim Chase  
> > <python.l... at tim.thechases.com> escribió:
> >
> >
> >
> > > Unfortunately, a raw rstrip() eats other whitespace that may be  
> > > important.  I frequently get tab-delimited files, using the following  
> > > pseudo-code:
> >
> > >    def clean_line(line):
> > >      return line.rstrip('\r\n').split('\t')
> >
> > >    f = file('customer_x.txt')
> > >    headers = clean_line(f.next())
> > >    for line in f:
> > >      field1, field2, field3 = clean_line(line)
> > >      do_stuff()
> >
> > > if field3 is empty in the source-file, using rstrip(None) as you suggest  
> > > triggers errors on the tuple assignment because it eats the tab that  
> > > defined it.
> >
> > > I suppose if I were really smart, I'd dig a little deeper in the CSV  
> > > module to sniff out the "right" way to parse tab-delimited files.
> >
> > It's so easy that don't doing that is just inexcusable lazyness :)
> > Your own example, written using the csv module:
> >
> > import csv
> >
> > f = csv.reader(open('customer_x.txt','rb'), delimiter='\t')
> > headers = f.next()
> > for line in f:
> >      field1, field2, field3 = line
> >      do_stuff()
> >
> 
> And where in all of that do you recommend that .decode(some_encoding)
> be inserted?
> 

If encoding is an issue for your application, then I'd recommend you use
codecs.open('customer_x.txt', 'rb', encoding='ebcdic') instead of open()

> --
> http://mail.python.org/mailman/listinfo/python-list
>