python3 raw strings and \u escapes

rurpy at yahoo.com rurpy at yahoo.com
Wed May 30 11:07:28 EDT 2012


On 05/30/2012 05:54 AM, Thomas Rachel wrote:
> Am 30.05.2012 08:52 schrieb rurpy at yahoo.com:
>
>> This breaks a lot of my code because in python 2
>>        re.split (ur'[\u3000]', u'A\u3000A') ==>  [u'A', u'A']
>> but in python 3 (the result of running 2to3),
>>        re.split (r'[\u3000]', 'A\u3000A' ) ==>  ['A\u3000A']
>>
>> I can remove the "r" prefix from the regex string but then
>> if I have other regex backslash symbols in it, I have to
>> double all the other backslashes -- the very thing that
>> the r-prefix was invented to avoid.
>>
>> Or I can leave the "r" prefix and replace something like
>> r'[ \u3000]' with r'[  ]'.  But that is confusing because
>> one can't distinguish between the space character and
>> the ideographic space character.  It also a problem if a
>> reader of the code doesn't have a font that can display
>> the character.
>>
>> Was there a reason for dropping the lexical processing of
>> \u escapes in strings in python3 (other than to add another
>> annoyance in a long list of python3 annoyances?)
>
> Probably it is more consequent. Alas, it makes the whole stuff
> incompatible to Py2.
>
> But if you think about it: why allow for \u if \r, \n etc. are
> disallowed as well?

Maybe the blame is elsewhere then...  If the re module
interprets (in a regex string) the 2-character string
consisting of r'\' followed by 'n' as a single newline
character, then why wasn't re changed for Python 3 to
interpret the 6-character string, r'\u3000' as a single
unicode character to correspond with Python's lexer no
longer doing that (as it did in Python 2)?

>> And is there no choice for me but to choose between the two
>> poor choices I mention above to deal with this problem?
>
> There is a 3rd one: use   r'[ ' + '\u3000' + ']'. Not very nice to read,
> but should do the trick...

I guess the "+"s could be left out allowing something
like,

  '[ \u3000]' r'\w+ \d{3}'

but I'll have to try it a little; maybe just doubling
backslashes won't be much worse.  I did that for years
in Perl and lived through it.




More information about the Python-list mailing list