detecting newline character

Sun Apr 24 08:50:14 EDT 2011

Daniel Geržo wrote:

> Thomas 'PointedEars' Lahn wrote:
>> It is clear now that codecs.open() would not support universal newlines
>> from at least Python 2.6 forward as it is *documented* that it opens
>> files in *binary mode* only.  The source code that I have posted shows
>> that it therefore actively removes 'U' from the mode string when the
>> `encoding' argument was passed, and always appends 'b' to the mode if not
>> present.  As a result, __builtin__.open() is called without 'U' in the
>> `mode' argument, which is *documented* to set file.newlines to None
>> (regardless whether Python was compiled with universal newline support).
> 
> OK, it makes much more sense now, thanks for explanation. I didn't
> understood it when reading the docs. The io module seems to be good
> choice for my use case so I switched to using that for now.

ACK

> What is still a little confusing for me is that you stated "WFM", which
> I interpreted as "Works For Me", in one of your previous replies for
> both with and without encoding specified.

This is now[1] easily explained by a typo in my quick-hacked test module, 
where it said

  if __name__ == "main":
      CodecsTest()

instead of the proper

  if __name__ == "__main__":
      CodecsTest()

Testing too superficially, I concluded that because no exception was thrown 
there, there was no problem.  However, in fact, no exception was thrown 
there because the method in question was never called.  Sorry.

> I also have to state that it must have been changed sometime during 2.6
> line, because I started developing pysublib ca. 20 months ago on python
> 2.6 (don't know the minor version) and I am quite sure my tests were
> passing back in that time...

Yes, I have subsequently found the changelogs saying:

| What's New in Python 2.7 alpha 4?
| =================================
| 
| *Release date: 2010-03-06*
| 
| […]
| Library
| -------
|
| […]
| - Issue #691291: ``codecs.open()`` should not convert end of lines on
|   reading and writing.

| What's New in Python 2.6.5 rc 1?
| ================================
| 
| *Release date: 2010-03-01*
| 
| […]
| Library
| -------
| 
| […]
| - Issue #691291: codecs.open() should not convert end of lines on reading
|   and writing.

See also <http://bugs.python.org/issue691291> for the rationale.

I have python2.6_2.6.6-8+b1_i386 and python2.7_2.7.1-8_i386 installed. 
Fixing the typo above, both throw the exception under said circumstances, as 
expected.

That is why I suggested RTSL (which should not be that hard to do, see 
below.)

[1] Aside: I had already noticed that PyDev would show me the 2.6 source 
code of codecs.py when Ctrl-clicking e.g. `codecs' or `open' in a PyDev 
project where the grammar was set to Python 2.7.  Now I know why: The 
project's interpreter setting was set to "Default".  However, apparently 
"Default" refers to the first interpreter in the list in the Preferences, 
and that was Python 2.6 (as I added 2.7 later).  I have found that out by 
placing the Python 2.7 interpreter entry at the top of the list, clicking 
Apply twice, thereby "restoring the interpreter"; I got the mentioned 
exception then.  Although the logic of validating against e.g. the Python 
3.0 grammar and using a Python 2.7 interpreter escapes me, it should be 
noted that you should set *both* settings if unsure (using PyDev 
2.0.0.2011040403).

> Thank you for your help it's very appreciated.

You're welcome.

-- 
PointedEars