more on unescaping escapes

bvdp bob at mellowood.ca
Tue Feb 24 16:13:47 EST 2009


Adam Olsen wrote:
> On Feb 23, 7:18 pm, bvdp <b... at mellowood.ca> wrote:
>> Gabriel Genellina wrote:
>>> En Mon, 23 Feb 2009 23:31:20 -0200, bvdp <b... at mellowood.ca> escribió:
>>>> Gabriel Genellina wrote:
>>>>> En Mon, 23 Feb 2009 22:46:34 -0200, bvdp <b... at mellowood.ca> escribió:
>>>>>> Chris Rebert wrote:
>>>>>>> On Mon, Feb 23, 2009 at 4:26 PM, bvdp <b... at mellowood.ca> wrote:
>>>>>>> [problem with Python and Windows paths using backslashes]
>>>>>>>  Is there any particular reason you can't just internally use regular
>>>>>>> forward-slashes for the paths? [...]
>>>>>> you are absolutely right! Just use '/' on both systems and be done
>>>>>> with it. Of course I still need to use \x20 for spaces, but that is
>>>>>> easy.
>>>>> Why is that? "\x20" is exactly the same as " ". It's not like %20 in
>>>>> URLs, that becomes a space only after decoding.
>>>> I need to use the \x20 because of my parser. I'm reading unquoted
>>>> lines from a file. The file creater needs to use the form "foo\x20bar"
>>>> without the quotes in the file so my parser can read it as a single
>>>> token. Later, the string/token needs to be decoded with the \x20
>>>> converted to a space.
>>>> So, in my file "foo bar" (no quotes) is read as 2 tokens; "foo\x20bar"
>>>> is one.
>>>> So, it's not really a problem of what happens when you assign a string
>>>> in the form "foo bar", rather how to convert the \x20 in a string to a
>>>> space. I think the \\ just complicates the entire issue.
>>> Just thinking, if you was reading the string from a file, why were you
>>> worried about \\ and \ in the first place? (Ok, you moved to use / so
>>> this is moot now).
>> Just cruft introduced while I was trying to figure it all out. Having to
>> figure the \\ and \x20 at same time with file and keyboard input just
>> confused the entire issue :) Having the user set a line like
>> c:\\Program\x20File ... works just fine. I'll suggest he use
>> c:/program\x20files to make it bit simple for HIM, not my parser.
>> Unfortunately, due to some bad design decisions on my part about 5 years
>> ago I'm afraid I'm stuck with the \x20.
>>
>> Thanks.
> 
> You're confusing the python source with the actual contents of the
> string.  We already do one pass at decoding, which is why \x20 is
> quite literally no different from a space:
> 
>>>> '\x20'
> ' '
> 
> However, the interactive interpreter uses repr(x), so various
> characters that are considered formatting, such as a tab, get
> reescaped when printing:
> 
>>>> '\t'
> '\t'
>>>> len('\t')
> 1
> 
> It really is a tab that gets stored there, not the escape for one.
> 
> Finally, if you give python an unknown escape it passes it leaves it
> as an escape.  Then, when the interactive interpreter uses repr(x), it
> is the backslash itself that gets reescaped:
> 
>>>> '\P'
> '\\P'
>>>> len('\P')
> 2
>>>> list('\P')
> ['\\', 'P']
> 
> What does this all mean?  If you want to test your parser with python
> literals you need to escape them twice, like so:
> 
>>>> 'c:\\\\Program\\x20Files\\\\test'
> 'c:\\\\Program\\x20Files\\\\test'
>>>> list('c:\\\\Program\\x20Files\\\\test')
> ['c', ':', '\\', '\\', 'P', 'r', 'o', 'g', 'r', 'a', 'm', '\\', 'x',
> '2', '0', 'F', 'i', 'l', 'e', 's', '\\', '\\', 't', 'e', 's', 't']
>>>> 'c:\\\\Program\\x20Files\\\\test'.decode('string-escape')
> 'c:\\Program Files\\test'
>>>> list('c:\\\\Program\\x20Files\\\\test'.decode('string-escape'))
> ['c', ':', '\\', 'P', 'r', 'o', 'g', 'r', 'a', 'm', ' ', 'F', 'i',
> 'l', 'e', 's', '\\', 't', 'e', 's', 't']
> 
> However, there's an easier way: use raw strings, which prevent python
> from unescaping anything:
> 
>>>> r'c:\\Program\x20Files\\test'
> 'c:\\\\Program\\x20Files\\\\test'
>>>> list(r'c:\\Program\x20Files\\test')
> ['c', ':', '\\', '\\', 'P', 'r', 'o', 'g', 'r', 'a', 'm', '\\', 'x',
> '2', '0', 'F', 'i', 'l', 'e', 's', '\\', '\\', 't', 'e', 's', 't']

Thank you. That is very clear. Appreciate your time.



More information about the Python-list mailing list