Newby: how to transform text into lines of text

John Machin sjmachin at lexicon.net
Sun Jan 25 19:44:33 EST 2009


On 26/01/2009 10:34 AM, Tim Chase wrote:

> I believe that using the formulaic "for line in file(FILENAME)" 
> iteration guarantees that each "line" will have at most only one '\n' 
> and it will be at the end (again, a malformed text-file with no terminal 
> '\n' may cause it to be absent from the last line)

It seems that you are right -- not that I can find such a guarantee 
written anywhere. I had armchair-philosophised that writing 
"foo\n\r\nbar\r\n" to a file in binary mode and reading it on Windows in 
text mode would be strict and report the first line as "foo\n\n"; I was 
wrong.

> 
>> So, we are left with the unfortunately awkward
>>     if line.endswith('\n'):
>>         line = line[:-1]
> 
> You're welcome to it, but I'll stick with my more DWIM solution of "get 
> rid of anything that resembles an attempt at a CR/LF".

Thanks, but I don't want it. My point was that you didn't TTOPEWYM (tell 
the OP exactly what you meant).

My approach to DWIM with data is, given
    norm_space = lambda s: u' '.join(s.split())
to break up the line into fields first (just in case the field delimiter 
== '\t') then apply norm_space to each field. This gets rid of your '\r' 
at end (or start!) of line, and multiple whitespace characters are 
replaced by a single space. Whitespace includes NBSP (U+00A0) as an 
added bonus for being righteous and using Unicode :-)

> Thank goodness I haven't found any of my data-sources using "\n\r" 
> instead, which would require me to left-strip '\r' characters as well.  
> Sigh.  My kingdom for competency. :-/

Indeed. I actually got data in that format once from a *x programmer who 
was so kind as to do it that way just for me because he knew that I use 
Windows and he thought that's what Windows text files looked like. No 
kidding.

Cheers,
John



More information about the Python-list mailing list