raw strings under windows

Alex Martelli aleax at aleax.it
Mon Jun 16 03:34:21 EDT 2003


Bengt Richter wrote:
   ...
>>> path = r"c:\python23\"
>>> 
>>> I get a syntax error, unexpected EOL with singlequoted string.  It was
>>> my (mis?) understanding that raw strings did not process escaped
>>> characters?
>>
>>They don't, in that the backslash remains in the string resulting from
>>the raw literal, BUT so does the character right after the backslash,
> That seems like a contradiction to me. I.e., the logic that says to
> include "...the character right after the backslash, unconditionally."
> must be noticing (processing) backslashes.

Noticing, yes; processing, no.  That's using the fundamental meaning
of the verb "process" as given e.g. by the American Heritage dictionary:

To prepare, treat, or convert by subjecting to a special process: [eg]
process ore to obtain minerals.

In raw string literals, backslashes are not prepared, are not treated,
are not converted, and are not subjected to a special process.  Thus,
it makes sense to say they are not processed.  You may be favoring a
different nuance of the meaning of the verb "to process" (for example,
"to gain an understanding or acceptance of; come to terms with; [eg]
processed the traumatic event in therapy") but I think the "prepare,
treat or convert" one is primary and a sounder basis for the CS usage.


>>unconditionally.  As a result, a raw string literal cannot end with an
>>odd number of backslashes.  If they did otherwise, it would instead be
>>impossible to include a single quote character in a single-quoted raw
> So? Those cases would be 99.99% easy to get around with alternative
> quotes, especially considering that """ and ''' are alternative quotes.

To echo you, "so?".  The use case for the design of raw string literals
is regular-expression patterns.  Why cause ANY problems for whatever
fraction of RE patterns, when the current design choice causes no issue
with any valid RE pattern?  Note that a valid RE pattern can never end
with an odd number of backslashes.


>>string literal, etc.  Raw string literals are designed mainly to ease
>>the task of entering regular expressions, and for that purpose an odd
>>number of ending backslashes is never needed, while making inclusion of
>>quote characters harder _would_ be an issue, so the design choice was
>>easy to make.
> ISTM only inclusion of same-as-initial quote characters at the end would
> be a problem. Otherwise UIAM triple quotes take care of all but sequences
> with embedded triple quotes, which are pretty unusual, and pretty easy to
> spell alternatively (e.g. in tokenizer-concatenated pieces, with adjacent
> string literals separated by optional whitespace).

Just as obviously, it's even easier to "spell alternatively" a string
constant that ends with an odd number of backslashes.  The only issue is,
which use case should be subjected to this minor annoyance (of not being
directly expressible with a raw string literal): the intended one, RE
patterns (for which backslashes can well be actually needed and using
raw string literals thus makes sense), or "DOS filenames" (for which
backslashes can generally be advantageously replaced by plain slashes,
in addition to other "alternative spellings")?

As I said, this is an EASY design choice to make.  And if you can't see
it (I suspect you see it perfectly well and are just taking an opportunity
to start some useless argument) then there isn't much I can do about it:
the art of making the right tradeoffs is exactly that, an art, and the
main quality of Python is that the many design choices that add up to it
have been made consistently, intelligently, and elegantly.


> Was the design choice made before triple quotes? Otherwise what is the use
> case that would cause real difficulty? Of course, now there is a
> backwards-compatibility constraint, so that r"""xxxx\"""" must mean
> r'xxxx\"' and not induce a syntax error.

You're welcome to dig into the archives to find out exactly when triple
quoting was introducing wrt when raw string literals were introduced.  But
even if they were introduced simultaneously, what does that matter?


>>Of course people who use raw string literals to represent DOS paths might
>>wish otherwise, but as has been pointed out it's not a big problem in
>>any case -- not only, as you note:
>>
>>> Of course
>>> path = "c:\\python23\\"
>>> 
>>> works just fine.
> 
> I wouldn't mind a raw-string format that really did treat backslashes
> as ordinary characters. Perhaps upper case R could introduce that. E.g.,
> 
>    path = R"c:\python23\"

Then write a PEP proposing it.  You know perfectly well that such drastic
changes as additions to Python's syntax don't come about except via the
PEP process.  Thus, if you DON'T write a PEP, I will be confirmed in my
working hypothesis that you're not really looking for such a change, but
just looking for arguments for arguments' sake.


>>but so, almost invariably, does 'c:/python23/' (Microsoft's C runtime
>>libraries accept / interchangeably with \ as part of file path syntax,
>>and Python relies on the C runtime libraries and so does likewise).
>>
> Another alternative would be a chosen-delimiter raw format, e.g.,
> 
>    path = d'|c:\python23\|
> 
> or
> 
>    path = d'$c:\python23\$
> 
> I.e., the first character after d' is the chosen delimiter.
> Even matching-brackets delimiting could be possible
> 
>    d'[c:\python23\] == d'<c:\python23\> == d'{c:\python23\}
> 
> by recognizing [, <, or { delimiters specially. Space as a delimiter would
> be iffy practice.

I think the whole perlish idea stinks to high heavens, but I look
forwards to reading your PEP carefully detailing this proposal.


Alex





More information about the Python-list mailing list