[I18n-sig] codecs module, readlines and xreadlines

M.-A. Lemburg mal@lemburg.com
Thu, 16 Jan 2003 17:14:38 +0100


Poor Yorick wrote:
>=20
>=20
> Martin v. L=F6wis wrote:
>=20
>> "M.-A. Lemburg" <mal@lemburg.com> writes:
>>
>>> On Windows, the 'r' opens the file in text which mangles the line-end
>>> information. You should try to open the file in 'rb' (binary) mode
>>> for comparison.
>>>
>>
>> The issue is, of course, that codecs.open is usually meant for text
>> data, so comparing 'r' to 'r' is fair, IMO.
>>
>>> codecs.open() automatically appends the 'b' to the 'r' for you,
>>> so this is probably the cause of the problem.
>>>
>>
> Whether the file is opened in binary mode or in text mode, the '\r'=20
> character is still there.  It isn't mangled, it's just that in the=20
> utf-16 encoding all characters are encoded as double-byte characters,=20
> and \r\n becomes \x00\r\x00\n.
>=20
> The thing is that I AM processing text data.  It just happens to be=20
> unicode text data.  The example I used turns into perfectly legible=20
> chinese characters once it's decoded in Python.  I think that people=20
> using the codecs module on Windows to read Unicode text files would=20
> expect codecs.open.readlines to behave exactly like the builtin=20
> open.readlines.=20
> open.readlines automatically removes the "\r" character on Windows=20
> systems when the file is opened and read in text mode, and inserts a \r=
=20
> character when a \n is written to a file,=20

That's what I meant with mangling. I don't see any code
in fileobject.c which would do the above, so unless I've
overlooked something the MS C lib must apply this
operation.

> so to be consistent,=20
> codecs.open.readlines should do the same thing and remove \x00\r when=20
> the file is opened in text mode.

But only on Windows, right ? (On Unix text mode and binary mode
behave identically)

--=20
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/