String escaping utility for Python (was: Rawest raw string literals)

Chris Angelico rosuav at gmail.com
Sat Apr 22 20:33:11 EDT 2017


On Sun, Apr 23, 2017 at 10:19 AM, Mikhail V <mikhailwas at gmail.com> wrote:
> On 23 April 2017 at 00:48, Chris Angelico <rosuav at gmail.com> wrote:
>> On Sun, Apr 23, 2017 at 8:30 AM, Mikhail V <mikhailwas at gmail.com> wrote:
>>> The purpose is simple: reduce manual work to escape special
>>> characters in string literals (and escape non-ASCII characters).
>>>
>>> Simple usage scenario:
>>> - I have a long command-line string in some text editor.
>>> - Copy this string and paste into the utility edit box
>>> - In the second edit box same string with escaped characters
>>>   appears (i.e tab becomes \t, etc)
>>> - Further, if I edit the text in the second edit box,
>>>   an unescaped string appears in the first box.
>>
>> Easy.
>>
>>>>> input()
>> This string has "quotes" of 'various' «styles», and \backslashes\ too.
>> 'This string has "quotes" of \'various\' «styles», and \\backslashes\\ too.'
>>
>> The repr of a string does pretty much everything you want. If you want
>> a nice GUI, you can easily put one together that uses repr() to escape
>> and ast.literal_eval() to unescape.
>
> I am sorry, could you elaborate what have you shown here?
> So in Python console I can become escaped string, but what
> commands do you use? I never use Python console actually :/

You type "input()" at the Python console, then type the string you
want. It will be echoed back in representation form, with everything
correctly escaped.

> And yes the idea is to have a nice GUI. And the idea is exactly opposite
> to "everyone let's roll an own tool". Obviously I can spend day
> or two and create such a tool, e.g. with PyQt.
> But since the task is very common and quite unambiguos I think it is
> a good reason for a standard official tool.

Or you could spend two seconds firing up the Python REPL, which has
all the tools you need right there :)

>>> PS:
>>> Also I remember now about the python-ideas thread
>>> on entering unicode characters with decimals instead of
>>> hex values. It was met somewhat negatively but then it turned out
>>> that in recent Python version it can be done with f-strings.
>>> E.g. a string :
>>>
>>> s="абв"
>>> one can write as:
>>> s = f"{1072:c}{1073:c}{1074:c}"
>>> instead of traditional hex:
>>> "\u0430\u0431\u0432"
>>>
>>> It was told however this is not normal usage.
>>> Still I find it very helpful, so if this is correct syntax, I'd
>>> personally find such a conversion option also very useful.
>>
>> Most of the world finds the hex form MUCH more logical, since Unicode
>> is built around 16s and 256s and such. Please don't proliferate more
>> messes - currently, the only place I can think of where decimal is
>> supported is HTML character entities, and hex is equally supported
>> there.
>>
>> Of course, the best way to represent most non-ASCII characters is as
>> themselves - s="абв" from your example. The main exception is
>> combining characters and related incomplete forms, such as this table
>> of diacritical marks more-or-less lifted from an app of mine:
>>
>> {
>>     "\\`":"\u0300","\\'":"\u0301","\\^":"\u0302","\\~":"\u0303",
>>     "\\-":"\u0304","\\@":"\u0306","\\.":"\u0307","\\\"":"\u0308",
>>     "\\o":"\u030A","\\=":"\u030B","\\v":"\u030C","\\<":"\u0326",
>>     "\\,":"\u0327","\\k":"\u0328",
>> }
>>
>> All of them are in the 03xx range. Much easier than pointing out that
>> they're in the range 768 to 879. Please stick to hex.
>
> I don't insist on decimals, I want to use decimals for my own pleasure
> in own projects, may I?
> And don't worry in my whole life I will not produce so many software
> that will significantly increase the 'messes'.
> (Anyway I've got used already to decimals somehow, ord(char), etc.,
> so for me it's too late for the ugly hex)

Will your projects ever be shared with anyone else? If so, please use
the standard. In your own projects, you're welcome to shoot yourself
in the foot, but I'm not going to help you. I'm going to encourage hex
for Unicode.

It's not too late for you to adjust your mind to the standard. And I
strongly recommend it. There are good reasons for hex, and the sooner
you change, the easier it'll be.

ChrisA



More information about the Python-list mailing list