backslash woes........

Martin Franklin martin.franklin at westerngeco.com
Tue Jul 10 10:15:29 EDT 2001


Thanks for your time Duncan I think I am beginning to get it.  I 
did not use re at all when I only needed to support this function
on UNIX and then when windows came along I (for some unknown 
reason) reached for re as a solution....  I am beginning to regret
this now  ;-)

Anyway as I said in reply to someone else in this thread I have a 
working solution (and a better understanding)

Thanks to all,
Martin.


Duncan Booth wrote:
> 
> Martin Franklin <martin.franklin at westerngeco.com> wrote in
> news:3B4ADD33.CA2836D1 at westerngeco.com:
> 
> >> I think you maybe misunderstand what raw strings do. Raw strings
> >> simply prevent any backslash character that is present in the string
> >> from being interpreted as an escape sequence. They don't affect the
> >> processing or use of the string in any way. Since none of your literal
> >> strings contain backslashes there is no reason to use raw strings.
> >> In regular expressions backslashes are special, but so are many other
> >> characters that could appear in filenames, even on Unix.
> >
> >
> > You are right I don't understand...  My strings do include backslashes
> > (they are windows filenames from os.path.walk())  I Have indeed changed
> > to using string.replace() - having read the HOW TO on
> > www.python.org.... and it seems to work (without using raw strings....)
> > This all seems very confusing!
> >
> 
> Let me try to explain. A raw string is a change in notation, not a change
> in the string itself. So r'%s' is exactly the same as '%s' or "%s" or
> '''%s''' or '\x25\x73', but r'\x25\x73' is a string containing 8 characters
> two of which are backslashes.
> If you write a string containing a backslash, e.g. 'c:\autoexec.bat' the
> backslash may be interpreted as beginning an escape sequence, so in this
> case you get 'c:\x07utoexec.bat' as the \a converts to a bell character.
> Writing r'c:\autoexec.bat' or writing 'c:\\autoexec.bat' both give you a
> identical string containing exactly 15 characters. Both of these are
> strings (there is no separate raw string type), and each of them contains
> exactly one backslash character:
> 
> >>> file1 = r'c:\autoexec.bat'
> >>> file2 = 'c:\\autoexec.bat'
> >>> print file1
> c:\autoexec.bat
> >>> print file2
> c:\autoexec.bat
> >>> print repr(file1)
> 'c:\\autoexec.bat'
> >>> print repr(file2)
> 'c:\\autoexec.bat'
> >>> print len(file1), len(file2)
> 15 15
> >>> print type(file1), type(file2)
> <type 'string'> <type 'string'>
> 
> In other words the r prefix on a raw string simply changes the way
> the string literal is regarded at compile time, it has no further effect on
> the processing of data after Python has compiled your code.
> 
> If your program reads data from a file, or indeed gets it anywhere else,
> then backslashes have no special meaning. Only string literals do this
> special interpretation.
> 
> The real confusion creeps in because backslash also has a special meaning
> in regular expressions. So to put a backslash into a regular expression you
> must escape it by preceding it with another backslash, and to write two
> backslashes in literal string you must either use a raw string or write 4
> backslashes. So the string for a regular expression that matches one
> backslash followed by an 'x' could be written as:
>         s = '\\\\x'
>         s = r'\\x'
>         s = re.escape('\\x')
>         s = re.escape(r'\x')
> In all of these s ends up as the same three character string: two
> backslashes followed by an 'x'.
> 
> Why the 'x'? Because for reasons that escape me, raw strings cannot end
> with a single backslash:
> >>> r'\\'
> '\\\\'
> >>> r'\'
>   File "<stdin>", line 1
>     r'\'
>        ^
> SyntaxError: invalid token
> 
> I hope this makes things a bit clearer.
> --
> Duncan Booth                                             duncan at rcp.co.uk
> int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
> "\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?



More information about the Python-list mailing list