Splitting text into lines

Tue Dec 13 12:54:51 EST 2016

On Tue, Dec 13, 2016, at 12:25, George Trojan - NOAA Federal wrote:
> >
> > Are repeated newlines/carriage returns significant at all? What about
> > just using re and just replacing any repeated instances of '\r' or '\n'
> > with '\n'? I.e. something like
> >  >>> # the_string is your file all read in
> >  >>> import re
> >  >>> re.sub("[\r\n]+", "\n", the_string)
> > and then continuing as before (i.e. splitting by newlines, etc.)
> > Does that work?
> > Cheers,
> > Thomas
> 
> 
> The '\r\r\n' string is a line separator, though not used consistently in
> US
> meteorological bulletins. I do not want to eliminate "real" empty lines.

I'd do re.sub("\r*\n", "\n", the_string). Any "real" empty lines are
almost certainly going to have two \n characters, regardless of any \r
characters. It looks like what *happens* is that the file, or some part
of the file, had \r\n line endings originally and was "converted" to
turn the \n into \r\n.

> I was hoping there is a way to prevent read() from making hidden changes
> to the file content.

Pass newline='' into open.