Elegant hack or gross hack? TextWrapper and escape codes

Thu May 28 04:01:15 EDT 2020

On Thu, May 28, 2020 at 5:54 PM Peter Otten <__peter__ at web.de> wrote:
>
> Chris Angelico wrote:
>
> > Situation: A terminal application. Requirement: Display nicely-wrapped
> > text. With colour codes in it. And that text might be indented to any
> > depth.
> >
> > label = f"{indent}\U0010cc32{code}\U0010cc00
> > @{tweet['user']['screen_name']}: " wrapper = textwrap.TextWrapper(
> >     initial_indent=label,
> >     subsequent_indent=indent + " " * 12,
> >     width=shutil.get_terminal_size().columns,
> >     break_long_words=False, break_on_hyphens=False, # Stop URLs from
> >     breaking
> > )
> > for line in tweet["full_text"].splitlines():
> >     print(wrapper.fill(line)
> >         .replace("\U0010cc32", "\x1b[32m\u2026")
> >         .replace("\U0010cc00", "\u2026\x1b[0m")
> >     )
> >     wrapper.initial_indent = wrapper.subsequent_indent # For
> > subsequent lines, just indent them
> >
> >
> > The parameter "indent" is always some number of spaces (possibly
> > zero). If I simply include the escape codes in the label, their
> > characters will be counted, and the first line will be shorter. Rather
> > than mess with how textwrap defines text, I just replace the escape
> > codes *and one other character* with a placeholder. In the final
> > display, \U0010cc32 means "colour code 32 and an ellipsis", and
> > \U0010cc00 means "colour code 0 and an ellipsis", so textwrap
> > correctly counts them as one character each.
> >
> > So what do you folks think? Is this a gloriously elegant way to
> > collapse nonprinting text, or is it a gross hacky mess
>
> Yes ;)

... I should have expected a "yes" response to an either-or question.
Silly of me. :)

> > that's going to cause problems?
>
> Probably not.
>
> I had a quick look at the TextWrapper class, and it doesn't really lend
> itself to clean and elegant customisation. However, my first idea to
> approach this problem was to patch the len() builtin:
>
> import re
> import textwrap
>
> text = """The parameter "indent" is always some number of spaces (possibly
> zero). If I simply include the escape codes in the label, their
> characters will be counted, and the first line will be shorter. Rather
> than mess with how textwrap defines text, I just replace the escape
> codes *and one other character* with a placeholder. In the final
> display, \U0010cc32 means "colour code 32 and an ellipsis", and
> \U0010cc00 means "colour code 0 and an ellipsis", so textwrap
> correctly counts them as one character each.
> """
>
> print(textwrap.fill(text, width=40))
>
> # add some color to the text sample
> GREEN =  "\x1b[32m"
> NORMAL = "\x1b[0m"
> parts = text.split(" ")
> parts[::2] = [GREEN + p + NORMAL for p in parts[::2]]
> ctext = " ".join(parts)
>
> # wrong wrapping
> print(textwrap.fill(ctext, width=40))
>
> # fixed wrapping
> def color_len(s):
>     return len(re.compile("\x1b\[\d+m").sub("", s))
>
> textwrap.len = color_len
> print(textwrap.fill(ctext, width=40))
>
> The output of my ad-hoc test script looks OK. However, I did not try to
> understand the word-breaking regexes, so I don't know if the escape codes
> can be spread across words which would confuse color_len(). Likewise, I have
> no idea if textwrap can cope with zero-length chunks.
>
> But at least now you have two -- elegant or gross -- hacks to choose from ;)
>

Yeah, I thought of this originally as a challenge in redefining the
concept of "length". But the trouble is that it might not always be
the len() function that figures out the length - there might be a
regex with a size threshold or any number of other things that
effectively think about the length of the string.

ChrisA