Newby: how to transform text into lines of text
Tim Chase
python.list at tim.thechases.com
Sun Jan 25 18:34:18 EST 2009
>> One other caveat here, "line" contains the newline at the end, so
>> you might have
>>
>> print line.rstrip('\r\n')
>>
>> to remove them.
>
> I don't understand the presence of the '\r' there. Any '\x0d' that
> remains after reading the file in text mode and is removed by that
> rstrip would be a strange occurrence in the data which the OP may
> prefer to find out about and deal with; it is not part of "the
> newline". Why suppress one particular data character in preference to
> others?
In an ideal world where everybody knew how to make a proper
text-file, it wouldn't be an issue. Recreating the form of some
of the data I get from customers/providers:
>>> f = file('tmp/x.txt', 'wb')
>>> f.write('headers\n') # headers in Unix format
>>> f.write('data1\r\n') # data in Dos format
>>> f.write('data2\r\n')
>>> f.write('data3') # no trailing newline of any sort
>>> f.close()
Then reading it back in:
>>> for line in file('tmp/x.txt'): print repr(line)
...
'headers\n'
'data1\r\n'
'data2\r\n'
'data3'
As for wanting to know about stray '\r' characters, I only want
the data -- I don't particularly like to be reminded of the
incompetence of those who send me malformed text-files ;-)
> The same applies in any case to the use of rstrip('\n'); if that finds
> more than one ocurrence of '\x0a' to remove, it has exceeded the
> mandate of removing the newline (if any).
I believe that using the formulaic "for line in file(FILENAME)"
iteration guarantees that each "line" will have at most only one
'\n' and it will be at the end (again, a malformed text-file with
no terminal '\n' may cause it to be absent from the last line)
> So, we are left with the unfortunately awkward
> if line.endswith('\n'):
> line = line[:-1]
You're welcome to it, but I'll stick with my more DWIM solution
of "get rid of anything that resembles an attempt at a CR/LF".
Thank goodness I haven't found any of my data-sources using
"\n\r" instead, which would require me to left-strip '\r'
characters as well. Sigh. My kingdom for competency. :-/
-tkc
More information about the Python-list
mailing list