[Pythonmac-SIG] a RegEx question

Charles Hartman charles.hartman at conncoll.edu
Tue Jan 18 22:40:54 CET 2005


Sorry, I know this isn't CompSci 101, but if any can help I'll 
appreciate it.

My question about regular expressions (any language, I suppose) has to 
do with finding *overlapping* and (relatedly) *longest* matches. Two 
examples:

	find 'abca' in string 'abcabca'

	find longest '(a[ab])+' in string 'baabaaabba'

Every *single* RE I can think of misses the second instance (beginning 
in position 3) in the first example, because it's eaten the beginning 
of it in finding the first instance. And the same problem means that in 
the second example, while an RE finds matches at positions 1 and 4 
('aa' and 'aaab' respectively), it can't find the longest, at position 
2 ('abaaab'), because the find at position 1 steps on it.

I know I can solve this by compiling the expression and using the <pos> 
argument that becomes available then. It's just awkward. And I keep 
thinking I must be missing something about RE syntax that would let me 
do these searches in a single step, not a loop (or, often, two nested 
loops).

Charles Hartman
Professor of English, Poet in Residence
http://cherry.conncoll.edu/cohar
http://villex.blogspot.com



More information about the Pythonmac-SIG mailing list