String escaping utility for Python (was: Rawest raw string literals)

Chris Angelico rosuav at gmail.com
Sat Apr 22 18:48:58 EDT 2017


On Sun, Apr 23, 2017 at 8:30 AM, Mikhail V <mikhailwas at gmail.com> wrote:
> The purpose is simple: reduce manual work to escape special
> characters in string literals (and escape non-ASCII characters).
>
> Simple usage scenario:
> - I have a long command-line string in some text editor.
> - Copy this string and paste into the utility edit box
> - In the second edit box same string with escaped characters
>   appears (i.e tab becomes \t, etc)
> - Further, if I edit the text in the second edit box,
>   an unescaped string appears in the first box.

Easy.

>>> input()
This string has "quotes" of 'various' «styles», and \backslashes\ too.
'This string has "quotes" of \'various\' «styles», and \\backslashes\\ too.'

The repr of a string does pretty much everything you want. If you want
a nice GUI, you can easily put one together that uses repr() to escape
and ast.literal_eval() to unescape.

> PS:
> Also I remember now about the python-ideas thread
> on entering unicode characters with decimals instead of
> hex values. It was met somewhat negatively but then it turned out
> that in recent Python version it can be done with f-strings.
> E.g. a string :
>
> s="абв"
> one can write as:
> s = f"{1072:c}{1073:c}{1074:c}"
> instead of traditional hex:
> "\u0430\u0431\u0432"
>
> It was told however this is not normal usage.
> Still I find it very helpful, so if this is correct syntax, I'd
> personally find such a conversion option also very useful.

Most of the world finds the hex form MUCH more logical, since Unicode
is built around 16s and 256s and such. Please don't proliferate more
messes - currently, the only place I can think of where decimal is
supported is HTML character entities, and hex is equally supported
there.

Of course, the best way to represent most non-ASCII characters is as
themselves - s="абв" from your example. The main exception is
combining characters and related incomplete forms, such as this table
of diacritical marks more-or-less lifted from an app of mine:

{
    "\\`":"\u0300","\\'":"\u0301","\\^":"\u0302","\\~":"\u0303",
    "\\-":"\u0304","\\@":"\u0306","\\.":"\u0307","\\\"":"\u0308",
    "\\o":"\u030A","\\=":"\u030B","\\v":"\u030C","\\<":"\u0326",
    "\\,":"\u0327","\\k":"\u0328",
}

All of them are in the 03xx range. Much easier than pointing out that
they're in the range 768 to 879. Please stick to hex.

ChrisA



More information about the Python-list mailing list