`re' difficulty?

François Pinard pinard at IRO.UMontreal.CA
Mon Oct 18 21:57:42 EDT 1999


Hi, people.  I got a strangety, here, on this machine running Python 1.5.1.
(This is the machine where the TP repository is kept, I'm not really the
guy installing software on it.)  Here is what I got:

>>> entry = 'parted-0.0.8-pre1/po/parted.pot'
>>> re.match('parted-([.0-9]+[a-z]?|[.0-9]+-b[0-9]+|[.0-9]+-pre[0-9]+)', entry).group(1)
'0.0.8'
>>> re.match('parted-([.0-9]+-b[0-9]+|[.0-9]+-pre[0-9]+|[.0-9]+[a-z]?)', entry).group(1)
'0.0.8-pre1'

As you may see, between parentheses, the second line has A|B|C, while
the third has B|C|A.  Since the results are not equivalent, I presume the
longest match does not apply here, as it was usual for me so far, whenever
regular expressions are concerned.

May I guess this is all implemented with backtracking, with the first
matching alternative shadowing the remaining alternatives?  Isn't that
commiting Python to a behaviour prohibiting later optimisations?  Or is
the exact behaviour just undefined?  What's the story? :-)

-- 
François Pinard   http://www.iro.umontreal.ca/~pinard




More information about the Python-list mailing list