RFC PEP candidate: q'<delim>'quoted<delim> ?

Fri Mar 8 20:36:37 EST 2002

On Fri, 8 Mar 2002 21:13:48 +0300 (MSK), Roman Suzi <rnd at onego.ru> wrote:

>On 4 Mar 2002, Bengt Richter wrote:
>
>>On Sun, 3 Mar 2002 14:56:45 +0300 (MSK), Roman Suzi <rnd at onego.ru> wrote:
>>
>>>On 3 Mar 2002, Bengt Richter wrote:
>>>
>>>>Problem: How to put quotes around an arbitrary program text?
>>>
>>>Have it in a separate file.
>>>
>>That's fine in a lot of cases, but does require reliable presence
>>of both files. It's not so convenient if you are trying to write
>>e.g. a program generator and want to write out snippets containing
>>mixed triple quoted doc strings and data etc. Or if you are
>>writing copied example snips as part of dynamic HTML from CGI.
>
>>===the stuff I want===
>>
>>
>>Sure, I can copy it and make a separate file, and write some code to
>>read the file into my snippet string, but would you actually prefer to
>>do it that way? Ok, I could write a class to systematize it. Maybe I
>>will, when I want use file-stored snippets, but I'd like both options
>>;-) And I guess I'd like to have Q' as well as q' ;-)
>
>Well, maybe my first impressions were wrong. After all, the feature is
>convenient. But probably you will need to coordinate it well
>with encoding things. What if "dochere"'s encoding need to be
>different from main program one? Are binary data allowed?
>Otherwise the feature will become grammatical disaster...
>
Thanks for the first glimmer of positive feedback ;-)
But you put your finger on an interesting aspect, which is important
whether this particular quoting mechanism exists or not. Cf. my other post
in this thread re eval("r'\x07'") etc.

It's funny, but "raw" strings are less likely to represent binary data
than ordinary strings. You can't re-render a raw string containing binary data
as a raw string not containing binary data, whereas you can with an ordinary
string, since escapes are available. The normal "raw" string is actually
usually representing a source string, without interpreting escapes, so it
itself can't represent control characters within the normal source alphabet,
it can only represent representations of control/unprintable characters.
So the "raw" name is misleading in a way.

If you pasted binary (i.e., encoded as uninterpreted octets) anywhere into a
Python source encoded as Latin-1, presumably the octets would go 1:1 into
the source, but when you saw them on the screen, they would appear according
to the screen font, yet when re-rendered, would appear escaped (inside strings
otherwise they would be syntax errors (except maybe comments?)), as in:

 >>> '^G','\x07',r'^G'
 ('\x07', '\x07', '\x07')

... where I typed Ctrl-G binary data into the source where the screen rendered ^G.

Presumably, if the source encoding were UTF-8, pasting octets would change
them to UTF-8. However, interpreting the UTF-8 source representation of
a an octet-string (o'...' ?) would generate the original binary octet
sequence as the value of the internal data representation at run time. I think ;-)

>>>Making Python as gibberish as Perl is. And all that only to
>>>have Windows path be written without double-\
>>Not 'only'. I said 'also' ;-)  Perhaps my choice of '|' delimiter triggered
>>your 'gibberish as Perl' detector?
>
>;-) Maybe. I wonder why Perl novices do not know about "dochere"
>capabilities of Perl.
>
I don't know. It's not that prominent in the camel book, but I found it ;-)

Regards,
Bengt Richter