[issue22687] horrible performance of textwrap.wrap() with a long word

Wed Nov 12 11:06:55 CET 2014

Serhiy Storchaka added the comment:

> Why not? I guess it depends on English's rules for word splitting, which I
> don't know.

I suppose this is common rule in many languages. And current code supports it (there is a special code in the regex to ensure this rule).

> In any case, this issue is not about improving correctness,
> only performance.

But the patch shouldn't add a regression.

$ ./python -c "import textwrap; print(textwrap.wrap('this-is-a-useful', width=1, break_long_words=False))"

Current code: ['this-', 'is-a-useful']
Patched: ['this-', 'is-', 'a-', 'useful']

Just use lookahead assertion to ensure that the hyphen is followed by at least two letters.

My previous message is about that current code is not always correct so it is acceptable to replace it with not absolutely equivalent code.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue22687>
_______________________________________