splitting perl-style find/replace regexp using python

Peter Otten __peter__ at web.de
Thu Mar 1 05:25:48 EST 2007


James Stroud wrote:

> James Stroud wrote:
>> John Pye wrote:
>>> Hi all
>>>
>>> I have a file with a bunch of perl regular expressions like so:
>>>
>>> /(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/ #
>>> bold
>>> /(^|[\s\(])\_\_([^ ].*?[^ ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/
>>> b>''$3/ # italic bold
>>> /(^|[\s\(])\_([^ ].*?[^ ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ #
>>> italic
>>>
>>> These are all find/replace expressions delimited as '/search/replace/
>>> # comment' where 'search' is the regular expression we're searching
>>> for and 'replace' is the replacement expression.
>>>
>>> Is there an easy and general way that I can split these perl-style
>>> find-and-replace expressions into something I can use with Python, eg
>>> re.sub('search','replace',str) ?
>>>
>>> I though generally it would be good enough to split on '/' but as you
>>> see the <\/b> messes that up. I really don't want to learn perl
>>> here :-)
>>>
>>> Cheers
>>> JP
>>>
>> 
>> This could be more general, in principal a perl regex could end with a
>> "\", e.g. "\\/", but I'm guessing that won't happen here.
>> 
>> py> for p in perlish:
>> ...   print p
>> ...
>> /(^|[\s\(])\*([^ ].*?[^ ])\*([\s\)\.\,\:\;\!\?]|$)/$1'''$2'''$3/
>> /(^|[\s\(])\_\_([^ ].*?[^
>> ])\_\_([\s\)\.\,\:\;\!\?]|$)/$1''<b>$2<\/b>''$3/ /(^|[\s\(])\_([^ ].*?[^
>> ])\_([\s\)\.\,\:\;\!\?]|$)/$1''$2''$3/ py> import re
>> py> splitter = re.compile(r'[^\\]/')
>> py> for p in perlish:
>> ...   print splitter.split(p)
>> ...
>> ['/(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
>> "$1'''$2'''$", '']
>> ['/(^|[\\s\\(])\\_\\_([^ ].*?[^ ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
>> "$1''<b>$2<\\/b>''$", '']
>> ['/(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$',
>> "$1''$2''$", '']
>> 
>> (I'm hoping this doesn't wrap!)
>> 
>> James
> 
> I realized that threw away the closing parentheses. This is the correct
> version:
> 
> py> splitter = re.compile(r'(?<!\\)/')
> py> for p in perlish:
> ...   print splitter.split(p)
> ...
> ['', '(^|[\\s\\(])\\*([^ ].*?[^ ])\\*([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
> "$1'''$2'''$3", '']
> ['', '(^|[\\s\\(])\\_\\_([^ ].*?[^
> ])\\_\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)', "$1''<b>$2<\\/b>''$3", '']
> ['', '(^|[\\s\\(])\\_([^ ].*?[^ ])\\_([\\s\\)\\.\\,\\:\\;\\!\\?]|$)',
> "$1''$2''$3", '']

There is another problem with escaped backslashes:

>>> re.compile(r'(?<!\\)/').split(r"/abc\\/def/")
['', 'abc\\\\/def', '']

Peter



More information about the Python-list mailing list