[issue43518] textwrap.shorten does not always respect word boundaries

Fri Mar 19 20:16:07 EDT 2021

Terry J. Reedy <tjreedy at udel.edu> added the comment:

Verified in 3.10.0a6 that change is at 3 !s.  I agree that is is a bug relative to the doc.

The issue is that 'world!!!' is 8 chars, and by default, wrap splits that into 'w' and 'orld!!!' and add ' w' to 'hello'.
>>> sh('hello world!!!', width=7)
['hello w', 'orld!!!']

A solution is to not break long words.
>>> sh('hello world!!!', width=7, placeholder='', break_long_words=False)
'hello'

Then

>>> sh('hello!!!! world!!!', width=7, placeholder='', break_long_words=False)
''

versus

>>> sh('hello!!!! world!!!', width=7, placeholder='')
'hello!!'

The docstring and doc say "enough words are dropped from the end so that the remaining words plus the placeholder fit within width:".  Taking this literally, '' is correct.  So a fix would be to add "break_long_words=False" to options if break_long_words not in options.

Antoine, you last touched the shorten docstring.  Serhiy, you last touched its code.  What do you two think?

----------
nosy: +pitrou, serhiy.storchaka, terry.reedy
stage:  -> needs patch
versions: +Python 3.10, Python 3.9

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue43518>
_______________________________________