[issue32397] textwrap output may change if you wrap a paragraph twice

Andrei Kulakov report at bugs.python.org
Tue Aug 3 12:04:21 EDT 2021


Andrei Kulakov <andrei.avk at gmail.com> added the comment:

I think fix to make `drop_whitespace=False` stable, can be as simple as adding two lines in `_munge_whitespace()`:

+            text = re.sub(r' \n', ' ', text)
+            text = re.sub(r'\n ', ' ', text)
             text = text.translate(self.unicode_whitespace_trans)

The perf impact is not small though, 12% :

2892 (~/opensource/cpython) % ./python.exe -mtimeit 'import textwrap' 'textwrap.wrap("abc foo\nbar baz", 5)'              --INS--
5000 loops, best of 5: 60.2 usec per loop

2893 (~/opensource/cpython) % r                                                                                           --INS--
./python.exe -mtimeit 'import textwrap' 'textwrap.wrap("abc foo\nbar baz", 5)'
5000 loops, best of 5: 52.9 usec per loop


I don't know if it's worth doing, but if yes, the options are:

 - just add this change for drop_whitespace=False, which is not the default, so perf regression will not affect default usage of wrap.

 - add a new arg that will only have effect when drop_whitespace=False, and will run these 2 lines. Name could be something like `collapse_space_newline`. It's hard to think of a good name.

If '\r\n' is handled, it needs one additional `sub()` line, and the perf. difference is 22%.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue32397>
_______________________________________


More information about the Python-bugs-list mailing list