[Python-Dev] 51 Million calls to _PyUnicodeUCS2_IsLinebreak() (???)

Wed Aug 24 16:08:12 CEST 2005

Martin v. Löwis wrote:

> Walter Dörwald wrote:
> 
>>I think a maxsplit argument (just as for unicode.split()) would help too.
> 
> Correct - that would allow to get rid of the quadratic part.

OK, such a patch should be rather simple. I'll give it a try.

> We should also strive for avoiding the second copy of the line,
> if the user requested keepends.

Your suggested unicode method islinebreak() would help with that. Then 
we could add the following to the string module:

unicodelinebreaks = u"".join(unichr(c) for c in xrange(0, 
sys.maxunicode) if unichr(c).islinebreak())

Then

     if line and not keepends:
         line = line.splitlines(False)[0]

could be

     if line and not keepends:
         line = line.rstrip(string.unicodelinebreaks)

> I wonder whether it would be worthwhile to cache the .splitlines result.
> An application that has just invoked .readline() will likely invoke
> .readline() again. If there is more than one line left, we could return
> the first line right away (potentially trimming the line ending if
> necessary). Only when a single line is left, we would attempt to
> read more data. In a plain .read(), we would first join the lines
> back.

OK, this would mean we'd have to distinguish between a direct call to 
read() and one done by readline() (which we do anyway through the 
firstline argument).

Bye,
    Walter Dörwald