trouble with regex with escaped metachars (URGENT please O:-)

Duncan Booth duncan at NOSPAMrcp.co.uk
Thu Nov 20 11:53:26 EST 2003


Fernando Rodriguez <frr at easyjob.net> wrote in
news:3amprvo6n734ovhg5ekatp0539q738809t at 4ax.com: 

> Here's the code I'm using:
> 
>         def substitute(name, value, cts):
>             """
>             Finds all the occs in  cts of $<name>
>             and replaces them with value
>             """
>             
>             pat = re.compile("\$<" + name + ">", re.IGNORECASE)
> 
>             return pat.sub(val, cts)  # this line causes the error
>             (see below) 
> 
>         def escapeMetachars( s ):
>             """
>             All metacharacters in the user provided substitution must
>             be escaped
>             """
>             meta = r'\.^$+*?{[|()'
>             esc = ''
> 
>             for c in s:
>                 if c in meta:
>                     esc += '\\' + c
>                 else:
>                     esc += c
> 
>             return esc
> 
> cts = """Compression=bzip/9
> OutputBaseFilename=$<OutputFileName>
> OutputDir=$<OutputDir>
> LicenseFile=Z:\apps\easyjob\main\I18N\US\res\license.txt"""

You forgot to double some backslashes here, not that this is relevant to 
your problem.

> 
> name = 'OutputDir'
> value = "c:\\apps\\whatever\\"  # contains the backslash metachar
> 
> print substitute( escapeMetachars(name), value,  cts)
> 
> I get this error:
> Traceback (most recent call last):
>   File "<pyshell#38>", line 1, in -toplevel-
>     pat.sub(s,cts)
Strangely, this line doesn't appear in the code you said you were using.

>   File "C:\ARCHIV~1\python23\Lib\sre.py", line 257, in _subx
>     template = _compile_repl(template, pattern)
>   File "C:\ARCHIV~1\python23\Lib\sre.py", line 244, in _compile_repl
>     raise error, v # invalid expression
> error: bogus escape (end of line)
> 
> What on earth is this? O:-)

Whatever it is, its not what I get when I copy and paste your code.
I get a "NameError: global name 'val' is not defined" when I use the code 
you posted. So, I deduce that you modified the code before posting it and 
all bets are off.

However, it would appear that your main problem is that pat.sub will try to 
unescape any backslashes in the replacement string, so you want to double 
them all before using.

Another minor point is that your escapeMetachars looks to be a poor-man's 
version of re.escape, so you might prefer to use re.escape instead.

Putting that all together gives:

>>> cts = """Compression=bzip/9
OutputBaseFilename=$<OutputFileName>
OutputDir=$<OutputDir>
LicenseFile=Z:\\apps\\easyjob\\main\\I18N\\US\\res\\license.txt"""
>>> name = 'OutputDir'
>>> value = "c:\\apps\\whatever\\"  # contains the backslash metachar
>>> def substitute(name, value, cts):
            """
            Finds all the occs in  cts of $<name>
            and replaces them with value
            """
            value = value.replace('\\', '\\\\')
            name = re.escape(name)
            pat = re.compile("\$<" + name + ">", re.IGNORECASE)
            
            return pat.sub(value, cts)

        
>>> print substitute(name, value,  cts)
Compression=bzip/9
OutputBaseFilename=$<OutputFileName>
OutputDir=c:\apps\whatever\
LicenseFile=Z:\apps\easyjob\main\I18N\US\res\license.txt


That will work fine so long as (a) you only have a few strings to 
substitute, and (b) none of the replacements contain sequences looking like 
substitution strings.

If you have more strings to substitute you might try making a dict of 
replacements and applying them all at the same time:

>>> def substitute(mapping, cts):
            """
            Finds all the occs in  cts of $<name>
            and replaces them with value
            """
            pat = re.compile("\$<([^>]+)>", re.IGNORECASE)

            def repl(match):
                key = match.group(1).lower()
                return mapping[key]
		
            return pat.sub(repl, cts)

        
>>> keys = { 'outputfilename': 'C:\\output.txt', 'outputdir': value }
>>> print substitute(keys, cts)
Compression=bzip/9
OutputBaseFilename=C:\output.txt
OutputDir=c:\apps\whatever\
LicenseFile=Z:\apps\easyjob\main\I18N\US\res\license.txt

-- 
Duncan Booth                                             duncan at rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?




More information about the Python-list mailing list