matching multiple regexs to a single line...
Alexander Sendzimir
lists at battleface.com
Tue Nov 19 09:45:38 EST 2002
# # # SOLUTION BASED ON PREVIOUS POSTS # # #
# For consistency, I've written all the notes as Python comments. So
# anything that's not a comment is code. The original problem is to
# match multiple regular expressions to a single line until one
# regular expression matches. The first solution was a brute force
# approach which entailed a (possibly long) series of match-if-break
# statements. See previous posts in this thread for example. After
# some consultation and experimentation, I've devised the following
# two solutions based on various input from Trent Mick and John
# Hunter. Thanks to both of them.
# Both solutions are of the same design. The second solution is an
# optimization if there are many regular expressions to match with
# corresponding actions to be taken. It uses a dictionary linking
# names to actions.
# The basic design is a list of tuples of the form (name, regex) where
# name is an arbitrary string identifying the regular expression
# regex. In the inner for-loop, the regular expression is matched to
# the current input line. If the match succeeds, then the match object
# (mo) is defined and the if-statement is true and falls through to
# the next if-statement. The second solution replaces this last
# if-statement with a dictionary lookup so that the 'do something'
# comment is replaced with a call to a single handler.
#
# F I R S T S O L U T I O N
#
regexs = [
( 'regex_id1', sre.compile( r'regex1' ) ),
( 'regex_id2', sre.compile( r'regex2' ) ),
( 'regex_id3', sre.compile( r'regex3' ) ),
.
.
.
( 'regex_idN', sre.compile( r'regexN' ) ) ]
for line in lines :
for regex in regexs :
mo = regex[1].match( line )
if mo :
if ( 'regex_id1' == regex[0] ) :
# do something
break
elif ( 'regex_id2' == regex[0] ) :
# do something
break
elif ( 'regex_id3' == regex[0] ) :
# do something
break
.
.
.
elif ( 'regex_idN' == regex[0] ) :
# do something
else :
pass
#
# S E C O N D S O L U T I O N
#
#
# define the handlers
#
def handler_id1 :
pass
def handler_id2 :
pass
def handler_id3 :
pass
.
.
.
def handler_idN :
pass
regexs = [
( 'regex_id1', sre.compile( r'regex1' ) ),
( 'regex_id2', sre.compile( r'regex2' ) ),
( 'regex_id3', sre.compile( r'regex3' ) ),
.
.
.
( 'regex_idN', sre.compile( r'regexN' ) ) ]
#
# define the dictionary of ids --> handlers
#
regex_actions = {
'regex_id1' : handler_id1,
'regex_id2' : handler_id2,
'regex_id3' : handler_id3,
.
.
.
'regex_idN' : handler_idN }
for line in lines :
for regex in regexs :
mo = regex[1].match( line )
if mo :
regex_actions[ regex[0] ]( optional_arguments );
else :
pass
# All this done and said, I wonder if it would be useful to have the
# capacity to assign a delegate method to a regular expression object?
# If the expression matches, then the delegate is called. If no
# delegate is assigned, then, of course, no action is taken which is
# the usual behavior.
More information about the Python-list
mailing list