matching multiple regexs to a single line...

Alexander Sendzimir lists at battleface.com
Tue Nov 19 09:45:38 EST 2002


# # # SOLUTION BASED ON PREVIOUS POSTS # # #


# For consistency, I've written all the notes as Python comments. So
# anything that's not a comment is code. The original problem is to
# match multiple regular expressions to a single line until one
# regular expression matches. The first solution was a brute force
# approach which entailed a (possibly long) series of match-if-break
# statements. See previous posts in this thread for example. After
# some consultation and experimentation, I've devised the following
# two solutions based on various input from Trent Mick and John
# Hunter. Thanks to both of them.

# Both solutions are of the same design. The second solution is an
# optimization if there are many regular expressions to match with
# corresponding actions to be taken. It uses a dictionary linking
# names to actions.

# The basic design is a list of tuples of the form (name, regex) where
# name is an arbitrary string identifying the regular expression
# regex. In the inner for-loop, the regular expression is matched to
# the current input line. If the match succeeds, then the match object
# (mo) is defined and the if-statement is true and falls through to
# the next if-statement. The second solution replaces this last
# if-statement with a dictionary lookup so that the 'do something'
# comment is replaced with a call to a single handler.

#
#   F I R S T   S O L U T I O N
#

regexs = [
    ( 'regex_id1', sre.compile( r'regex1' ) ),
    ( 'regex_id2', sre.compile( r'regex2' ) ),
    ( 'regex_id3', sre.compile( r'regex3' ) ),
	.
	.
	.
    ( 'regex_idN', sre.compile( r'regexN' ) ) ]


for line in lines :
	for regex in regexs :
		mo = regex[1].match( line )
		if mo :
			if ( 'regex_id1' == regex[0] ) :
				# do something
				break
			elif ( 'regex_id2' == regex[0] ) :
				# do something
				break
			elif ( 'regex_id3' == regex[0] ) :
				# do something
				break
			.
			.
			.
			elif ( 'regex_idN' == regex[0] ) :
				# do something
		else :
			pass


#
#   S E C O N D   S O L U T I O N
#

#
# define the handlers
#

def handler_id1 :
	pass

def handler_id2 :
	pass

def handler_id3 :
	pass

.
.
.

def handler_idN :
	pass


regexs = [
    ( 'regex_id1', sre.compile( r'regex1' ) ),
    ( 'regex_id2', sre.compile( r'regex2' ) ),
    ( 'regex_id3', sre.compile( r'regex3' ) ),
	.
	.
	.
    ( 'regex_idN', sre.compile( r'regexN' ) ) ]


#
# define the dictionary of ids --> handlers
#

regex_actions = {
	'regex_id1' : handler_id1,
	'regex_id2' : handler_id2,
	'regex_id3' : handler_id3,
	.
	.
	.
	'regex_idN' : handler_idN }


for line in lines :
	for regex in regexs :
		mo = regex[1].match( line )
		if mo :
			regex_actions[ regex[0] ]( optional_arguments );
		else :
			pass


# All this done and said, I wonder if it would be useful to have the
# capacity to assign a delegate method to a regular expression object?
# If the expression matches, then the delegate is called. If no
# delegate is assigned, then, of course, no action is taken which is
# the usual behavior.




More information about the Python-list mailing list