backslash plague

Bengt Richter bokr at oz.net
Sun Oct 24 06:59:40 EDT 2004


On Sat, 23 Oct 2004 22:33:05 GMT, bokr at oz.net (Bengt Richter) wrote:

>On Fri, 22 Oct 2004 21:20:30 +0200, aleaxit at yahoo.com (Alex Martelli) wrote:
>
>>Luis P. Mendes <luisXX_lupe2XX at netvisaoXX.pt> wrote:
>>    ...
>>> I've already read many pages on this but I'm not able to separate the
>>> string 'R0\1.2646\1.2649\D' in four elements, using the \ as the separator.
>>
>>x = r'R0\1.2646\1.2649\D'
>>elements = x.split('\\')
>>
>>> and why must I write two '' after the \?  If I hadn't used r I would
>>> understand...
>>
>>A raw literal can't end with an odd number of backslashes (_some_ way
>>has to be there to escape the quote char, after all).
>>
>Hm, just had the thought that something analogous to HDLC bit-stuffing
>could be used. IIRC bitstreams had escape flags composed of 5 successive bits,
>and if you wanted to transmit 5 successive data bits, you just added an extra bit
>at the end to make 6 to show that the five did not comprise a flag. The extra bits
>would get dropped on decoding when a 6th 1 followed 11111 and would be recognized
>as a flag otherwise.
BZZT! wrong ;-(
The flag is 01111110 and I believe 0 gets stuffed after the 5th 1 to make sure
the flag is not part of data between real flags.

>
>Translating this to quoted character sequences, we could have an alternate triple
>quoted raw string format, with quote-stuffing instead of escapes. I.e., to quote
>three successive quote characters, we stuff a 4th quote, which the tokenizer drops
>as it creates the internal byte sequence string representation, so we don't need
>escapes in the usual sense.
This does not work for e.g. quoting as single quote, so it's not general at all :-(

>
>Thus (using f prefix to indicate flagged quote-stuffing syntax) you could write:
>
>    x = f'''c:\whatever\'''
>
>and to quote the line above (without taking advantage of alternate quotes):
>
>    q = f'''    x = f''''c:\whatever\'''''''
>         ^^^         ^^^|            ^^^|^^^
>
>where ^^^ is flag and | indicates a stuffed quote that
>makes the previous otherwise-flag into three quotes in the data.
>You could quote again (using same type quote for illustrative purposes
>again, since oviously you could do better using both ' and "):
>
>    r = f'''f''''    x = f'''''c:\whatever\'''''''''''
>             ^^^|         ^^^|             ^^^|^^^|^^^
>
>(I think ;-)
>
>I guess the worst-case data to quote would be a repeating pattern of
>'''""" or """''' since neither type of quote character would give an
>advantage, but 1-in-6 overhead is still not too bad, and it would be rare.
>
>Is there a hole in this raw string quoting syntax?
>
Unfortunately, yes.

I thought of another format, but it doesn't quote previously quoted arbitrary text
without modifying at least the last character, so phooey. Might as well go to the
previously suggested mime-style delimiting, which an editor macro could do for
arbitrary selected text. It could use str(time.time()) as delimiter text without
much risk, IWT.

Regards,
Bengt Richter



More information about the Python-list mailing list