splitting perl-style find/replace regexp using python

James Stroud jstroud at mbi.ucla.edu
Thu Mar 1 05:17:33 EST 2007


James Stroud wrote:
> John Pye wrote:
>> Hi all
>>
>> I have a file with a bunch of perl regular expressions like so:
>>
>> /(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
>> bold
>> /(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
>> b>''$3/ # italic bold
>> /(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
>> italic
>>
>> These are all find/replace expressions delimited as '/search/replace/
>> # comment' where 'search' is the regular expression we're searching
>> for and 'replace' is the replacement expression.
>>
>> Is there an easy and general way that I can split these perl-style
>> find-and-replace expressions into something I can use with Python, eg
>> re.sub('search','replace',str) ?
>>
>> I though generally it would be good enough to split on '/' but as you
>> see the <\/b> messes that up. I really don't want to learn perl
>> here :-)
>>
>> Cheers
>> JP
>>
> 
> This could be more general, in principal a perl regex could end with a 
> "\", e.g. "\\/", but I'm guessing that won't happen here.
> 
> py> for p in perlish:
> ...   print p
> ...
> /(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/
> /(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/b>''$3/
> /(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/
> py> import re
> py> splitter = re.compile(r'[^\\]/')
> py> for p in perlish:
> ...   print splitter.split(p)
> ...
> ['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$', 
> "$1'''$2'''$", '']
> ['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$', 
> "$1''<b>$2<\\/b>''$", '']
> ['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$', 
> "$1''$2''$", '']
> 
> (I'm hoping this doesn't wrap!)
> 
> James

I realized that threw away the closing parentheses. This is the correct 
version:

py> splitter = re.compile(r'(?<!\\)/')
py> for p in perlish:
...   print splitter.split(p)
...
['', '(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$)', 
"$1'''$2'''$3", '']
['', '(^|[\\s\\(])\\_\\_([^ ].*?[^ 
])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)', "$1''<b>$2<\\/b>''$3", '']
['', '(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)', 
"$1''$2''$3", '']

James



More information about the Python-list mailing list