Unicode and Python - how often do you index strings?

Tim Chase python.list at tim.thechases.com
Fri Jun 6 06:37:05 EDT 2014


On 2014-06-06 10:47, Johannes Bauer wrote:
> > Personally I tend toward rstrip('\r\n') so that I don't have to
> > worry about files with alternative line terminators.
> 
> Hm, I was under the impression that Python already took care of
> removing the \r at a line ending. Checking that right now:
> 
> (DOS encoded file "y")
> >>> for line in open("y", "r"): print(line.encode("utf-8"))
> ...
> b'foo\n'
> b'bar\n'
> b'moo\n'
> b'koo\n'
> 
> Yup, the \r was removed automatically. Are there cases when it
> isn't?

It's possible if the file is opened as binary:

>>> f = file('delme.txt', 'wb')
>>> f.write('hello\r\nworld\r\n')
>>> f.close()
>>> f = file('delme.txt', 'rb')
>>> for row in f: print repr(row)
... 
'hello\r\n'
'world\r\n'
>>> f.close()


-tkc




More information about the Python-list mailing list