[Python-bugs-list] [Bug #116251] SRE miscompiles character class containing -
noreply@sourceforge.net
noreply@sourceforge.net
Fri, 6 Oct 2000 10:56:27 -0700
Bug #116251, was updated on 2000-Oct-06 09:12
Here is a current snapshot of the bug.
Project: Python
Category: Library
Status: Open
Resolution: None
Bug Group: None
Priority: 6
Summary: SRE miscompiles character class containing -
Details: (Found by Neil Schemenauer) Consider this test program:
import re
p = re.compile('[\w]+')
m = p.match('laser_beam')
print m and m.span()
p = re.compile('[\w-]+')
m = p.match('laser_beam')
print m and m.span()
This prints (0,10) and None, but the second pattern just adds a - inside the character class, so it should still match. Printing the code generated by the two patterns
shows that they're compiled differently.
(Is there a disassembler for SRE byte code hiding somewhere?
I'd have dug further if there was...)
Follow-Ups:
Date: 2000-Oct-06 09:22
By: akuchling
Comment:
Found the .dump() method; it seems to me that the pattern is
being tokenized and compiled to a sequence all right.
Incidentally, [\w+]+ matches correctly, even though
[\w-]+ doesn't.
-------------------------------------------------------
Date: 2000-Oct-06 10:56
By: akuchling
Comment:
The bug is in sre_parse._parse(); it produces a bad
parse tree when a category such as \w is followed by -.
Something like \w0- works fine. Not sure what the fix is...
-------------------------------------------------------
For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=116251&group_id=5470