[Python-Dev] PEP 385: the eol-type issue

Fri Aug 7 10:31:01 CEST 2009

Neil Hodgson wrote:
> M.-A. Lemburg:
> 
>> ... and because of this, the feature is already available if
>> you use codecs.open() instead of the built-in open():
> 
>    So should I not add an issue for the basic open because codecs.open
> should be used for this case?

Like Antoine mentioned: Using codecs.open() and .readline()
is about 20-30 times slower than open().

This is mainly due to the fact that the codec's .readline()
method is implemented in pure Python and does its own
buffering.

IMHO, it would be a lot better to add full Unicode support
for line breaks to the io layer. Given that the code for the
complicated handling of the CRLF combination is already there,
it's not difficult to add support for the remaing line break
characters.

The implementation could reuse the Bloom filter approach
used in unicodeobject.c to make this very fast.

BTW: I'm not sure why the io layer records the line endings
it has seen. This makes processing more complicated for no
apparent reason. In the few cases where you might need this
(I don't see any), you could just as well scan the lines
in a quick loop using Python.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 07 2009)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/