[New-bugs-announce] [issue6664] readlines should understand Line Separator and Paragraph Separator characters

Neil Hodgson report at bugs.python.org
Fri Aug 7 11:14:15 CEST 2009


New submission from Neil Hodgson <nyamatongwe at users.sourceforge.net>:

Unicode includes Line Separator U+2028 and Paragraph Separator U+2029
line ending characters. The readlines method of the file object returned
by the built-in open does not treat these characters as line ends
although the object returned by codecs.open(..., encoding='utf-8') does.

The attached program creates a UTF-8 file containing three lines with
the second line ended with a Paragraph Separator. The program then reads
this file back in as a text file. Only two lines are seen when reading
the file back in.

The desired behaviour is for the file to be read in as three lines.

----------
components: IO
files: lineends.py
messages: 91397
nosy: nyamatongwe
severity: normal
status: open
title: readlines should understand Line Separator and Paragraph Separator characters
versions: Python 3.1
Added file: http://bugs.python.org/file14671/lineends.py

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue6664>
_______________________________________


More information about the New-bugs-announce mailing list