Help needed: cryptic perl regular expression in python syntax, Ugly solution

Pekka Niiranen pekka.niiranen at wlanmail.com
Tue Oct 19 15:47:51 EDT 2004


Thanks,

I managed to solve my problem with code like this:
 >>> line = '   s^\\?AAA\\?01^BBB^g; #Comment '
 >>> r1 = '(^\\s*)(s|tr)(.)(\\\\\\?\\\\??'
 >>> key = "AAA\?01"
 >>> r2 = '\\\\??)\\3(.*?)\\3(.*)'
 >>> r = r1 + re.escape(key) + r2
 >>> re.compile(r).findall(line)
[('   ', 's', '^', '\\?AAA\\?01', 'BBB', 'g; #Comment ')]

but what an ugly piece of code...

I was hoping to do without excess backslashes with re.escape(),
but no avail since group item '\3' gets misquoted (among other things):

 >>> r2 = "\??)\3(.*?)\3(.*)/)"
 >>> re.escape(r2)
'\\\\\\?\\?\\)\\\x03\\(\\.\\*\\?\\)\\\x03\\(\\.\\*\\)\\/\\)'


-pekka-



Antoon Pardon wrote:
> Op 2004-10-19, pekka niiranen schreef <pekka.niiranen at wlanmail.com>:
> 
>>Hi there,
>>
>>I have perl script that uses dynamically
>>constructed regular in this way:
>>
>>------perl code starts ----
>>$result "";
>>$key = AAA\?01;
>>$key = quotemeta $key;
>>$line = "   s^\?AAA\?01^BBB^g; #Comment "
>>if ($line =~ /(^\s*)(s|tr)(.)(\\?\??$key\??)\3(.*?)\3(.*)/) {
>>	$result = $5;
>>
>># $result should be "BBB"
>># \3 gets the same value as returned by (.)
>># which is in this example ^. So we are searching
>># parameter limited by first two ^-signs
>># and returning the one limited byt the second
>># and third ^-sign. Note that using \3 in regular
>># expression enables other constants used than ^ -sign.
>>
>>------perl code stops ----
>>
>>How can I construct equivalent python regural expression ?
>>
>>I have tested with constant regular expression like this:
>>
>>
>>>>>line = '   s^\\?AAA\\?01^BBB^g; #Comment '
>>>>>r1 = "(^\s*)(s|tr)(.)(\\\\\?\\\??AAA\\\\\?01)"
>>>>>re.compile(r1).findall(line)
>>
>>[('   ', 's', '^', '\\?AAA\\?01')]
>>
>>Which is fine, but is there a way to join 3 raw strings
>>together into another raw strings? like:
>>
>>r1 = r'''(^\s*)(s|tr)(.)(\\?\??'''
>>r2 = r'''\\?\??)\3(.*?)\3(.*)'''
>>p1 = r1 + key + r2 # p1 should remain raw string too
>>
> 
> 
> If I understand correctly there are no raw strings, just raw string
> literals. The re.compile uses just a normal string.
> 
> raw string literal just make it easier to form a strings that are
> typically used for regular expressions but the strings themselves
> are just ordinary strings.
> 
> 
>>>>s1="\\b"
>>>>s2=r"\b"
>>>>s1==s2
> 
> 1
> 
>>>>s1
> 
> '\\b'
> 
>>>>s2
> 
> '\\b'
> 
>>>>print s1
> 
> \b
> 
>>>>print s2
> 
> \b
> 
> 



More information about the Python-list mailing list