[Patches] [Patch #101477] Fixes of ReadStream.readline() in UTF-16 and -LE codecs

Mon, 27 Nov 2000 14:57:01 -0800

Patch #101477 has been updated. 

Project: python
Category: library
Status: Rejected
Summary: Fixes of ReadStream.readline() in UTF-16 and -LE codecs

Follow-Ups:

Date: 2000-Sep-14 07:09
By: fdrake

Comment:
Marc-Andre, please review this & decide what should happen next.
-------------------------------------------------------

Date: 2000-Sep-18 09:28
By: lemburg

Comment:
I'm not sure whether this is the right fix: Unicode defines many
more line break characters than just LF and the patch will only
work correctly on Unix (also note that UTF-16 can be BE and LE
-- your fix assumes LE).

A true fix would have to also touch the .read() method and
implement a true read-ahead buffer strategy to get this done
right.
-------------------------------------------------------

Date: 2000-Sep-19 03:41
By: lemburg

Comment:
Postponed until after the Python 2.0b2 release.
-------------------------------------------------------

Date: 2000-Nov-27 14:57
By: gvanrossum

Comment:
This version of the patch is clearly bogus. In UTF-16 encodings, \n can occur whenever the low or high byte of a Unicode character is 0x0A. I don't know if Unicode is designed to avoid all such code positions but I can hardly believe it.

A correct readline() method would have to read 2 bytes at a time and check for u"\u000A". (I don't care for all the other Unicode line breaking characters, those are for a different application level presumably.)
-------------------------------------------------------

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=101477&group_id=5470