getting rid of EOL character ?

John Machin sjmachin at lexicon.net
Sat Apr 28 06:43:53 EDT 2007


On Apr 28, 7:25 pm, Michael Hoffman <cam.ac... at mh391.invalid> wrote:
> John Machin wrote:
> > On 27/04/2007 11:19 PM, Michael Hoffman wrote:
> >> stef wrote:
> >>> hello,
>
> >>> In the previous language I used,
> >>> when reading a line by readline, the EOL character was removed.
>
> > Very interesting; how did you distinguish between EOF and an empty line?
> > Did you need to call an isEOF() method before each read?
>
> >>> Now I'm reading a text-file with CR+LF at the end of each line,
> >>>    Datafile = open(filename,'r')    line = Datafile.readline()
>
> >>> now this gives an extra empty line
> >>>    print line
>
> >>> and what I expect that should be correct, remove CR+LF,
> >>> gives me one character too much removed
> >>>    print line[,-2]
>
> > Stef, that would give you a syntax error. I presume that you meant to
> > type line[:-2]
>
> >>> while this gives what I need ???
> >>>    print line[,-1]
>
> >>> Is it correct that the 2 characters CR+LF are converted to 1 character ?
>
> > In text mode (the default), whatever is the line ending on your platform
> > is converted to a single "newline" '\n' which is the same as LF.
>
> > Using line[:-1] is NOT recommended, as the last line in your file may
> > not be terminated, and in that case you would lose the last data character.
>
> >>> Is there a more automatic way to remove the EOL from the string ?
>
> >> line = line.rstrip("\r\n") should take care of it. If you leave out
> >> the parameter, it will strip out all whitespace at the end of the
> >> line, which is what I do in most cases.
>
> > If you want *exactly* what is in the line, use line.rstrip('\n') -- this
> > will remove only the trailing newline (if it exists).
>
> > If you want to strip all trailing whitespace, use line.rstrip() as
> > Michael suggested.
>
> > Michael, note carefully that line.rstrip('\r\n') removes instances of
> > '\r' OR '\n' -- the arg is a set of characters to be removed, not a
> > suffix to be removed. In Stef's situation, it "works" only by accident.
> > Using that would not always give you the correct answer -- e.g. if your
> > (Windows) file had a line ending in CR CR LF [I've seen stranger].
>
> I knew that about line.rstrip, but didn't consider the possibility of
> \r\r\n, while still wanting the first \r. Yuck.

It would be unusual to want that first \r -- a possibly more likely
scenario might be where your text file contains an extract from a
database, and you need to check that there are no unwanted (e.g.
unprintable) characters in the data (whether at the end of the line,
the middle, or the start).

In any case I think that you are missing the point that when reading a
normal text file on Windows with readline, while the line in the file
may be 'foo bar\r\n', what you get from readline is 'foo bar\n' -- so
in normal usage, the \r in your line.rstrip('\r\n') is pointless.

>
> Honestly, I almost always use line.rstrip()--it is seldom that I care
> about closing whitespace.

Honestly, I almost always split a line into fields and then for each
field, strip leading and trailing whitespace, and change runs of 1 or
more whitespace characters to a single space -- where "whitespace"
includes the pesky U+00A0 aka   which doesn't qualify as
whitespace in a str instance.

Cheers,
John




More information about the Python-list mailing list