Regular Expression AND mach

Robert Brewer fumanchu at amor.org
Sat Mar 20 13:30:03 EST 2004


Fuzzyman wrote:
> Jeff Epler <jepler at unpythonic.net> wrote in message 
> news:<mailman.161.1079716498.742.python-list at python.org>...
> > Regular expressions are not a good tool for this purpose.
> 
> Hmm... I'm not sure if I've been helped or not :-)
> Thanks anyway.....
> 
> Odd that you can't do this easily with regular expressions - I suppose
> it doesn't compile down to a neat test.... but then it's hardly a
> complex search... OTOH I have *no idea* how regular expressions
> actually work (and no need to find out)...

>From one Fu.*man to another ;) you do have a need to find out, even if
you don't recognize it. Start with A.M. Kuchling's excellent,
Python-based tutorial at: http://www.amk.ca/python/howto/regex/

At the least, you should understand why a regex is not an all-in-one
solution to your issue. It basically comes down to the fact that a regex
is geared to do its analysis in a single pass over your text. As it
finds partial matches, it may backtrack to try to find a complete match,
but in general, it moves forward. Therefore, if you want to find three
words in a *declared* order in your text, a single regex can do it
easily. If you want to find three words in *any* order, the simplest
solution using regexes is to perform three separate searches. There are
ways to get around this within a regex, but they're neither as simple
nor as maintainable as letting Python do the iteration:

>>> import re
>>> text = 'Some aa text cc with bb search terms.'
>>> search_terms = ['aa', 'bb', 'cc']
>>> [re.findall(re.escape(word), text) for word in search_terms]
[['aa'], ['bb'], ['cc']]

or, for your case:

>>> def has_all_terms(content, terms):
... 	for word in terms:
... 		if not re.search(re.escape(word), content):
... 			return False
... 	return True
... 
>>> has_all_terms(text, search_terms)
True
>>> has_all_terms('A is for aardvark.', search_terms)
False


HTCYTIRABM!

Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org




More information about the Python-list mailing list