raw strings under windows

Mon Jun 16 15:30:23 EDT 2003

On Mon, 16 Jun 2003 07:34:21 GMT, Alex Martelli <aleax at aleax.it> wrote:

>Bengt Richter wrote:
>   ...
>>>> path = r"c:\python23\"
>>>> 
>>>> I get a syntax error, unexpected EOL with singlequoted string.  It was
>>>> my (mis?) understanding that raw strings did not process escaped
>>>> characters?
>>>
>>>They don't, in that the backslash remains in the string resulting from
>>>the raw literal, BUT so does the character right after the backslash,
>> That seems like a contradiction to me. I.e., the logic that says to
>> include "...the character right after the backslash, unconditionally."
>> must be noticing (processing) backslashes.
>
>Noticing, yes; processing, no.  That's using the fundamental meaning
>of the verb "process" as given e.g. by the American Heritage dictionary:
>
>To prepare, treat, or convert by subjecting to a special process: [eg]
>process ore to obtain minerals.
>
>In raw string literals, backslashes are not prepared, are not treated,
>are not converted, and are not subjected to a special process.  Thus,
>it makes sense to say they are not processed.  You may be favoring a
>different nuance of the meaning of the verb "to process" (for example,
>"to gain an understanding or acceptance of; come to terms with; [eg]
>processed the traumatic event in therapy") but I think the "prepare,
>treat or convert" one is primary and a sounder basis for the CS usage.
Me too. Does converting raw string source with special backslash semantics
into an internal string representation where the string bytes have no
special backslash semantics (unless re-interpreted as source of some kind)
qualify as processing? I guess it's a nit either way, depending on focus.
Sorry I forgot the smiley ;-)
>
>
>>>unconditionally.  As a result, a raw string literal cannot end with an
>>>odd number of backslashes.  If they did otherwise, it would instead be
>>>impossible to include a single quote character in a single-quoted raw
>> So? Those cases would be 99.99% easy to get around with alternative
>> quotes, especially considering that """ and ''' are alternative quotes.
>
>To echo you, "so?".  The use case for the design of raw string literals
>is regular-expression patterns.  Why cause ANY problems for whatever
>fraction of RE patterns, when the current design choice causes no issue
>with any valid RE pattern?  Note that a valid RE pattern can never end
>with an odd number of backslashes.
Admittedly, there aren't any problems that can't be worked around, but
on second thought I'm not sure I understand your example of including
a single quote character in a single-quoted raw string, except by assuming
you are focusing on the regex use case exclusively (where the literal preceding
backslash will be a no-op since ['"] are not magic regex characters). IOW you are
depending on the destined use of the raw string to ignore the extra backslash
which you can't avoid including in your example.

I.e., it works, but you are actually generating a character not needed in the
regex itself. But then I'm not sure what the example means, since
r" ... ' ..." works even better than r' ... \' ...' and I would tend
to use triple quotes to avoid escaping if the regex had both. If I had
to write a regex to match all legal Python string literals, I guess I
would be scratching my head a while (and perhaps wishing for a selectable delimiter ;-)

>
>>>string literal, etc.  Raw string literals are designed mainly to ease
>>>the task of entering regular expressions, and for that purpose an odd
>>>number of ending backslashes is never needed, while making inclusion of
>>>quote characters harder _would_ be an issue, so the design choice was
>>>easy to make.
>> ISTM only inclusion of same-as-initial quote characters at the end would
>> be a problem. Otherwise UIAM triple quotes take care of all but sequences
>> with embedded triple quotes, which are pretty unusual, and pretty easy to
>> spell alternatively (e.g. in tokenizer-concatenated pieces, with adjacent
>> string literals separated by optional whitespace).
>
>Just as obviously, it's even easier to "spell alternatively" a string
>constant that ends with an odd number of backslashes.  The only issue is,
>which use case should be subjected to this minor annoyance (of not being
>directly expressible with a raw string literal): the intended one, RE
>patterns (for which backslashes can well be actually needed and using
>raw string literals thus makes sense), or "DOS filenames" (for which
>backslashes can generally be advantageously replaced by plain slashes,
>in addition to other "alternative spellings")?
>
>As I said, this is an EASY design choice to make.  And if you can't see
>it (I suspect you see it perfectly well and are just taking an opportunity
>to start some useless argument) then there isn't much I can do about it:
>the art of making the right tradeoffs is exactly that, an art, and the
>main quality of Python is that the many design choices that add up to it
>have been made consistently, intelligently, and elegantly.
>
>
>> Was the design choice made before triple quotes? Otherwise what is the use
>> case that would cause real difficulty? Of course, now there is a
>> backwards-compatibility constraint, so that r"""xxxx\"""" must mean
>> r'xxxx\"' and not induce a syntax error.
>
>You're welcome to dig into the archives to find out exactly when triple
>quoting was introducing wrt when raw string literals were introduced.  But
>even if they were introduced simultaneously, what does that matter?
>
>
>>>Of course people who use raw string literals to represent DOS paths might
>>>wish otherwise, but as has been pointed out it's not a big problem in
>>>any case -- not only, as you note:
>>>
>>>> Of course
>>>> path = "c:\\python23\\"
>>>> 
>>>> works just fine.
>> 
>> I wouldn't mind a raw-string format that really did treat backslashes
>> as ordinary characters. Perhaps upper case R could introduce that. E.g.,
>> 
>>    path = R"c:\python23\"
>
>Then write a PEP proposing it.  You know perfectly well that such drastic
>changes as additions to Python's syntax don't come about except via the
>PEP process.  Thus, if you DON'T write a PEP, I will be confirmed in my
>working hypothesis that you're not really looking for such a change, but
>just looking for arguments for arguments' sake.
>
Well, I don't enjoy argument per se. I do enjoy batting ideas around,
even those where the best strategy in a serious game would be to refrain
from swinging and walk.

But regarding your hypothesis about my motivations -- isn't it possible
that I would propose a non-PEP-able idea for other reasons than argument
for argument's sake? E.g., to get a reaction to the idea that might be
a better version, and closer to being PEP-worthy? Or to get a better sense
of any problematic use cases that other people have bumped into? Or to
test an idea in the flames and heat of the c.l.p crucible before investing
in a useless PEP effort? Is that not part of the purpose of c.l.p?

OTOH, your reaction makes me think that maybe there is a negative aspect
to casual discussion of possible alternatives to the current Python design.
E.g., if focusing on a small thing that some might think improvable
(or even an actual "wart") causes FUD for the language as a whole, that would
be unfortunate.

Anyway, I'm sorry if I came off as argumentative. I can see that a
non-smiley-qualified "So?" could push that button. Sorry if it did.

I have posted a number of not-fully-worked-out ideas (to euphemize selfservingly ;-)
in the past, and a number of times people have responded with better versions, even
though it was recognized that no version could make it through the PEP filter, and
in some cases wouldn't even have to, since they were technically legal.

I.e., even an exercise in language abuse can have educational value. E.g., I learned
something about how exceptions worked when I posted an abuse of exceptions
for the purpose of switch/case logic, which I was critiqued into making work, and
which then others improved upon. Of course, it may be advisable to label such
exercises clearly so newbies won't take such stuff as a model for coding practice ;-)

I could probably spend my time better though. Thanks for reminding me ;-)
>
>>>but so, almost invariably, does 'c:/python23/' (Microsoft's C runtime
>>>libraries accept / interchangeably with \ as part of file path syntax,
>>>and Python relies on the C runtime libraries and so does likewise).
>>>
>> Another alternative would be a chosen-delimiter raw format, e.g.,
>> 
>>    path = d'|c:\python23\|
>> 
>> or
>> 
>>    path = d'$c:\python23\$
>> 
>> I.e., the first character after d' is the chosen delimiter.
>> Even matching-brackets delimiting could be possible
>> 
>>    d'[c:\python23\] == d'<c:\python23\> == d'{c:\python23\}
>> 
>> by recognizing [, <, or { delimiters specially. Space as a delimiter would
>> be iffy practice.
>
>I think the whole perlish idea stinks to high heavens, but I look
>forwards to reading your PEP carefully detailing this proposal.
>
I would welcome a hint as to how to achieve an increment in capability
without offending your sensibilities ;-)

BTW, should little things like this be compounded into a single PEP in order to
counter the sense of creeping insignificant featuritis with more collective weight?

BTW, note that AFAICS no backward compatibility problem should arise from a new
string prefix letter or two, since they would currently be illegal. I don't think
it would be a big thing.

Regards,
Bengt Richter