reusing parts of a string in RE matches?

Mirco Wahab peace.is.our.profession at gmx.de
Thu May 11 06:18:38 EDT 2006


Hi mpeters42 & John

> With a more complex pattern (like 'a.a': match any character between
> two 'a' characters) this will get the length, but not what character is
> between the a's.

Lets take this as a starting point for another example
that comes to mind. You have a string of characters
interspersed with numbers: tx = 'a1a2a3A4a35a6b7b8c9c'

Now you try to find all _numbers_, which have
symmetrical characters (like a<-2->a) which
are not in 3/3/3... synced groups.

This can easy be done in P(ytho|nerl) etc. by
positive lookahead (even the same pattern does:)
Py:
  import re
  tx = 'a1a2a3A4a35a6b7b8c9c'
  rg = r'(\w)(?=(.\1))'
  print re.findall(rg, tx)
Pe:
  $_ = 'a1a2a3A4a35a6b7b8c9c';
  print /(\w)(?=(.)\1)/g;

(should find 1,2,7,9 only, python regex
written to var in order to prevent
clunky lines ;-)

BTW, Py Regex Engine seems to
be very close to the perl one:
Naive (!) matching of a pattern
with 14 o's (intersperded by
anything) against a string of
16 o's takes about exaclty the same
time here in Py(2.4.3) and Pe (5.8.7):

   tl = 'oooooooooooooooo'
   rg = r'o*o*o*o*o*o*o*o*o*o*o*o*o*o*[\W]'
   print re.search(rg, tl)

Py: 101 sec
Pe: 109 sec

(which would find no match because there's
no \W-like character at the end of the
string here)

Regards

Mirco



More information about the Python-list mailing list