detecting newline character

Thomas 'PointedEars' Lahn PointedEars at web.de
Sun Apr 24 05:19:27 EDT 2011


Daniel Geržo wrote:

> On 24.4.2011 9:05, jmfauth wrote:
>> Use the io module.
> 
> For the record, when I use io.open(file=self.path, mode="rt",
> encoding=enc)) as fobj:
> 
> my tests are passing and everything seems to work fine.
> 
> That indicates there is a bug with codecs module and universal newline
> support.

No, it proves that you either have not bothered to read the underlying 
source code and documentation (despite it has been quoted to you), or have 
not understood it.

It is clear now that codecs.open() would not support universal newlines from 
at least Python 2.6 forward as it is *documented* that it opens files in 
*binary mode* only.  The source code that I have posted shows that it 
therefore actively removes 'U' from the mode string when the `encoding' 
argument was passed, and always appends 'b' to the mode if not present.  As 
a result, __builtin__.open() is called without 'U' in the `mode' argument, 
which is *documented* to set file.newlines to None (regardless whether 
Python was compiled with universal newline support).

<http://docs.python.org/library/stdtypes.html?highlight=newlines#file.newlines>

`io' is a more general module than `codecs', therefore io.open() does not 
have those restrictions (but it has others – RTSL!¹).  Did you note that 
your `mode' argument does not contain `b'?  Append it and you will see why 
this cannot work.

The bug, if any, is that codecs.open() does not check for your wrong `mode' 
argument, while io.open() does.

_____
¹  RTSL: Read the Source, Luke!

-- 
PointedEars



More information about the Python-list mailing list