[Python-Dev] 51 Million calls to _PyUnicodeUCS2_IsLinebreak() (???)
Walter Dörwald
walter at livinglogic.de
Wed Aug 24 16:08:12 CEST 2005
Martin v. Löwis wrote:
> Walter Dörwald wrote:
>
>>I think a maxsplit argument (just as for unicode.split()) would help too.
>
> Correct - that would allow to get rid of the quadratic part.
OK, such a patch should be rather simple. I'll give it a try.
> We should also strive for avoiding the second copy of the line,
> if the user requested keepends.
Your suggested unicode method islinebreak() would help with that. Then
we could add the following to the string module:
unicodelinebreaks = u"".join(unichr(c) for c in xrange(0,
sys.maxunicode) if unichr(c).islinebreak())
Then
if line and not keepends:
line = line.splitlines(False)[0]
could be
if line and not keepends:
line = line.rstrip(string.unicodelinebreaks)
> I wonder whether it would be worthwhile to cache the .splitlines result.
> An application that has just invoked .readline() will likely invoke
> .readline() again. If there is more than one line left, we could return
> the first line right away (potentially trimming the line ending if
> necessary). Only when a single line is left, we would attempt to
> read more data. In a plain .read(), we would first join the lines
> back.
OK, this would mean we'd have to distinguish between a direct call to
read() and one done by readline() (which we do anyway through the
firstline argument).
Bye,
Walter Dörwald
More information about the Python-Dev
mailing list