[Python-Dev] Raw string syntax inconsistency

Mon Jun 18 17:11:05 CEST 2012

On Sun, Jun 17, 2012 at 10:59 PM, Terry Reedy <tjreedy at udel.edu> wrote:

> On 6/17/2012 9:07 PM, Guido van Rossum wrote:
>
>> On Sun, Jun 17, 2012 at 4:55 PM, Nick Coghlan <ncoghlan at gmail.com
>>
>
>     So, perhaps the answer is to leave this as is, and try to make 2to3
>>    smart enough to detect such escapes and replace them with their
>>    properly encoded (according to the source code encoding) Unicode
>>    equivalent?
>>
>>
>> But the whole point of the reintroduction of u"..." is to support code
>> that isn't run through 2to3.
>>
>
> People writing 2&3 code sometimes use 2to3 once (or a few times) on their
> 2.6/7 version during development to find things they must pay attention to.
> So Nick's idea could be helpful to people who do not want to use 2to3
> routinely either in development or deployment.
>
>
> > Frankly, I don't care how it's done, but
>
>> I'd say it's important not to silently have different behavior for the
>> same notation in the two versions.
>>
>
> The fundamental problem was giving the 'u' prefix two different meanings
> in 2.x: 'change the storage type from bytes to unicode', and 'change the
> contents by partially cooking the literal even when raw processing is
> requested'*. The only way to silently have the same behavior is to
> re-introduce the second meaning of partial cooking. (But I would rather
> make it unnecessary.) But that would freeze the 'u' prefix, or at least
> 'ur' ('un-raw') forever. It would be better to introduce a new, separate
> 'p' prefix, to mean partially raw, partially cooked. (But I am opposes to
>
> *I think this non-orthogonal interaction effect was a design mistake and
> that it would have been better to have re do all the cooking needed by also
> interpreting \u and \U sequences. I also think we should add this now for
> 3.3 if possible, to make partial cooking at the parsing stage unnecessary.
> Putting the processing in re makes it work for all strings, not just those
> given as literals.
>
>
> > If that means we have to add an extra
>
>> step to the compiler to reject r"\u03b3", so be it.
>>
>
> I do not get this. Surely you cannot mean to suddenly start rejecting, in
> 3.3, a large set of perfectly legal and sensible 6 and 10 character
> sequences when embedded in literals?
>

Sorry, I meant rejecting ru"...." (and ur"....") if it contains a \u or \U
escape that would be expanded by Python 2.

 Hm. I still encounter enough environments that don't know how to display
> such characters that I would prefer to have a rock solid \u escape
> mechanism. I can think of two ways to support "expanded" unicode
> characters in raw strings a la Python 2;
>

(a) let the re module interpret the escapes (like it does for \r and \n);

As said above, I favor this. The 2.x partial cooking (with 'ur' prefix) was
> primarily a substitute for this.
>
>
> (b) the user can write r"someblah" "\u03b3" r"moreblah".
>
> This is somewhat orthogonal to (a). Users can this whenever they want
> partial processing of backslashes without doubling those they want left as
> is. A generic example is r'someraw' 'somecooked' r'moreraw' 'morecooked'.
>
> --
> Terry Jan Reedy
>
>
>
>
> ______________________________**_________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/**mailman/listinfo/python-dev<http://mail.python.org/mailman/listinfo/python-dev>
> Unsubscribe: http://mail.python.org/**mailman/options/python-dev/**
> guido%40python.org<http://mail.python.org/mailman/options/python-dev/guido%40python.org>
>

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20120618/34aeedf1/attachment.html>