[issue12325] regex matches incorrectly on literal dot (99.9% confirmed)

Robert Lehmann report at bugs.python.org
Mon Jun 13 14:42:59 CEST 2011


Robert Lehmann <lehmannro at gmail.com> added the comment:

I can not reproduce either of your findings.  Could you provide us with your version information?  re version 2.2.1, _sre 2.2.2, Python 2.6.6, Debian sid here.  Also tested with Python 2.7.2rc1 (same RE).

>>> import re
>>> re.compile(r"\.co\.uk", re.DEBUG)
literal 46
literal 99
literal 111
literal 46
literal 117
literal 107
<_sre.SRE_Pattern object at 0xb73b0860>
>>> re.compile(r"(^| )((?:[\w\-]{2,}?\.|)(?:[\w\-]{2,}?)(?:\.com|\.net|\.org|\.co\.uk|\.tv|\.ly))", flags = re.IGNORECASE | re.MULTILINE | re.DEBUG).sub("\\1http://\\2", """me and a buddy and his girlfriend were watching tv once and this blabbering idiot starts talking about this scientific study she heard about where they built a fake city and only one guy didn't know that it was a fake. we all paused for a second and i said "the truman show?" and she says "yeah! that was the name of it!" me my buddy and his girlfriend all catch eyes and are baffled at how stupid she was""")
subpattern 1
...
'me and a buddy and his girlfriend were watching tv once...'

----------
nosy: +lehmannro

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue12325>
_______________________________________


More information about the Python-bugs-list mailing list