[Pythonmac-SIG] a RegEx question
Charles Hartman
charles.hartman at conncoll.edu
Tue Jan 18 22:40:54 CET 2005
Sorry, I know this isn't CompSci 101, but if any can help I'll
appreciate it.
My question about regular expressions (any language, I suppose) has to
do with finding *overlapping* and (relatedly) *longest* matches. Two
examples:
find 'abca' in string 'abcabca'
find longest '(a[ab])+' in string 'baabaaabba'
Every *single* RE I can think of misses the second instance (beginning
in position 3) in the first example, because it's eaten the beginning
of it in finding the first instance. And the same problem means that in
the second example, while an RE finds matches at positions 1 and 4
('aa' and 'aaab' respectively), it can't find the longest, at position
2 ('abaaab'), because the find at position 1 steps on it.
I know I can solve this by compiling the expression and using the <pos>
argument that becomes available then. It's just awkward. And I keep
thinking I must be missing something about RE syntax that would let me
do these searches in a single step, not a loop (or, often, two nested
loops).
Charles Hartman
Professor of English, Poet in Residence
http://cherry.conncoll.edu/cohar
http://villex.blogspot.com
More information about the Pythonmac-SIG
mailing list