[Tutor] Puzzled again
Dave Angel
d at davea.name
Wed Aug 3 21:04:49 CEST 2011
On 08/03/2011 01:48 PM, Richard D. Moores wrote:
> On Wed, Aug 3, 2011 at 10:11, Peter Otten<__peter__ at web.de> wrote:
>
>> <SNIP>
>> Dave was close, but Steven hit the nail: the string r"C:\Users\Dick\..." is
>> fine, but when you put it into the docstring it is not a raw string within
>> another string, it becomes just a sequence of characters that is part of the
>> outer string. As such \U marks the beginning of a special way to define a
>> unicode codepoint:
>> <snip>
> Here's from my last post:
>
> ====================================
> Now I edit it back to its original problem form:
>
> def convertPath(path):
> """
> Given a path with backslashes, return that path with forward slashes.
>
> By Steven D'Aprano 07/31/2011 on Tutor list
> >>> path = r'C:\Users\Dick\Desktop\Documents\Notes\College Notes.rtf'
> >>> convertPath(path)
> 'C:/Users/Dick/Desktop/Documents/Notes/College Notes.rtf'
> """<snip>
> Traceback (most recent call last):
> File "<stdin>", line 1, in<module>
> File "C:\Python32\lib\site-packages\mycalc2.py", line 10
> """
> SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes
> in position 144-146: truncated \UXXXXXXX
> X escape
>
> Using HxD, I find that the bytes in 144-146 are 20, 54, 75 or the
> <space>, 'T', 'u' of " Tutor" . A screen shot of HxD with this
> version of mycalc2.py open in it is at
> <http://www.rcblue.com/images/HxD.jpg>. You can see that I believe the
> offset integers are base-10 ints. I do hope that's correct, or I've
> done a lot of work for naught.
> ====================================
>
> So have I not used HxD correctly (my first time to use a hex reader)?
> If I have used it correctly, why do the reported problem offsets of
> 144-146 correspond to such innocuous things as 'T', 'u' and<space>,
> and which come BEFORE the problems you and Steven point out?
>
This one is my fault, for pointing you to the hex viewer. Peter is
correct. But the offset is relative to the beginning of the
triple-quoted string.
The problem has nothing to do with the encoding of the file itself, but
instead just with the backslashes inside the triple-quoted string.
Since you have a \U, the parser also expects 8 hex digits. The thing
that threw me was that this particular symptom is specific to Python
3.x, which I don't normally use.
The following line would have the same problem:
mystring = "abc \Unexpected def"
since the letters nexpecte don't spell out a valid hexcode. You would
instead want
mystring = r"abc \Unexpected def"
--
DaveA
More information about the Tutor
mailing list