Newby: how to transform text into lines of text

Tim Chase python.list at tim.thechases.com
Sun Jan 25 18:34:18 EST 2009


>> One other caveat here, "line" contains the newline at the end, so
>> you might have
>>
>>   print line.rstrip('\r\n')
>>
>> to remove them.
> 
> I don't understand the presence of the '\r' there. Any '\x0d' that
> remains after reading the file in text mode and is removed by that
> rstrip would be a strange occurrence in the data which the OP may
> prefer to find out about and deal with; it is not part of "the
> newline". Why suppress one particular data character in preference to
> others?

In an ideal world where everybody knew how to make a proper 
text-file, it wouldn't be an issue.  Recreating the form of some 
of the data I get from customers/providers:

  >>> f = file('tmp/x.txt', 'wb')
  >>> f.write('headers\n')  # headers in Unix format
  >>> f.write('data1\r\n')  # data in Dos format
  >>> f.write('data2\r\n')
  >>> f.write('data3')   # no trailing newline of any sort
  >>> f.close()

Then reading it back in:

  >>> for line in file('tmp/x.txt'): print repr(line)
  ...
  'headers\n'
  'data1\r\n'
  'data2\r\n'
  'data3'

As for wanting to know about stray '\r' characters, I only want 
the data -- I don't particularly like to be reminded of the 
incompetence of those who send me malformed text-files ;-)

> The same applies in any case to the use of rstrip('\n'); if that finds
> more than one ocurrence of '\x0a' to remove, it has exceeded the
> mandate of removing the newline (if any).

I believe that using the formulaic "for line in file(FILENAME)" 
iteration guarantees that each "line" will have at most only one 
'\n' and it will be at the end (again, a malformed text-file with 
no terminal '\n' may cause it to be absent from the last line)

> So, we are left with the unfortunately awkward
>     if line.endswith('\n'):
>         line = line[:-1]

You're welcome to it, but I'll stick with my more DWIM solution 
of "get rid of anything that resembles an attempt at a CR/LF".

Thank goodness I haven't found any of my data-sources using 
"\n\r" instead, which would require me to left-strip '\r' 
characters as well.  Sigh.  My kingdom for competency. :-/

-tkc








More information about the Python-list mailing list