Elegant hack or gross hack? TextWrapper and escape codes

Thu May 28 03:52:40 EDT 2020

Chris Angelico wrote:

> Situation: A terminal application. Requirement: Display nicely-wrapped
> text. With colour codes in it. And that text might be indented to any
> depth.
> 
> label = f"{indent}\U0010cc32{code}\U0010cc00
> @{tweet['user']['screen_name']}: " wrapper = textwrap.TextWrapper(
>     initial_indent=label,
>     subsequent_indent=indent + " " * 12,
>     width=shutil.get_terminal_size().columns,
>     break_long_words=False, break_on_hyphens=False, # Stop URLs from
>     breaking
> )
> for line in tweet["full_text"].splitlines():
>     print(wrapper.fill(line)
>         .replace("\U0010cc32", "\x1b[32m\u2026")
>         .replace("\U0010cc00", "\u2026\x1b[0m")
>     )
>     wrapper.initial_indent = wrapper.subsequent_indent # For
> subsequent lines, just indent them
> 
> 
> The parameter "indent" is always some number of spaces (possibly
> zero). If I simply include the escape codes in the label, their
> characters will be counted, and the first line will be shorter. Rather
> than mess with how textwrap defines text, I just replace the escape
> codes *and one other character* with a placeholder. In the final
> display, \U0010cc32 means "colour code 32 and an ellipsis", and
> \U0010cc00 means "colour code 0 and an ellipsis", so textwrap
> correctly counts them as one character each.
> 
> So what do you folks think? Is this a gloriously elegant way to
> collapse nonprinting text, or is it a gross hacky mess 

Yes ;)

> that's going to cause problems?

Probably not. 

I had a quick look at the TextWrapper class, and it doesn't really lend 
itself to clean and elegant customisation. However, my first idea to 
approach this problem was to patch the len() builtin:

import re
import textwrap

text = """The parameter "indent" is always some number of spaces (possibly
zero). If I simply include the escape codes in the label, their
characters will be counted, and the first line will be shorter. Rather
than mess with how textwrap defines text, I just replace the escape
codes *and one other character* with a placeholder. In the final
display, \U0010cc32 means "colour code 32 and an ellipsis", and
\U0010cc00 means "colour code 0 and an ellipsis", so textwrap
correctly counts them as one character each.
"""

print(textwrap.fill(text, width=40))

# add some color to the text sample
GREEN =  "\x1b[32m"
NORMAL = "\x1b[0m"
parts = text.split(" ")
parts[::2] = [GREEN + p + NORMAL for p in parts[::2]]
ctext = " ".join(parts)

# wrong wrapping
print(textwrap.fill(ctext, width=40))

# fixed wrapping
def color_len(s):
    return len(re.compile("\x1b\[\d+m").sub("", s))

textwrap.len = color_len
print(textwrap.fill(ctext, width=40))

The output of my ad-hoc test script looks OK. However, I did not try to 
understand the word-breaking regexes, so I don't know if the escape codes 
can be spread across words which would confuse color_len(). Likewise, I have 
no idea if textwrap can cope with zero-length chunks.

But at least now you have two -- elegant or gross -- hacks to choose from ;)