possible bug in re expression?

Chris Angelico rosuav at gmail.com
Fri Apr 25 12:55:09 EDT 2014


On Sat, Apr 26, 2014 at 2:30 AM, Robin Becker <robin at reportlab.com> wrote:
> Whilst translating some javascript code I find that this
>
> A=re.compile('.{1,+3}').findall(p)
>
> doesn't give any error, but doesn't manage to find the strings in p that I
> want len(A)==>0, the correct translation should have been
>
> A=re.compile('.{1,3}').findall(p)
>
> which works fine.
>
> should
>
> re.compile('.{1,+3}')
>
> raise an error? It doesn't on python 2.7 or 3.3.

I would say the surprising part is that your js code doesn't mind an
extraneous character in the regex. In a brace like that, negative
numbers have no meaning, so I would expect the definition of the regex
to look for digits, not "anything that can be parsed as a number". So
you've uncovered a bug in your code that just happened to work in js.

Should it raise an error? Good question. Quite possibly it should,
unless that has some other meaning that I'm not familiar with. Do you
know how it's being interpreted? I'm not entirely sure what you mean
by "len(A)==>0", as ==> isn't an operator in Python or JS. Best way to
continue, I think, would be to use regular expression matching (rather
than findall'ing) and something other than dot, and tabulate input
strings, expected result (match or no match), what JS does, and what
Python does. For instance:

Regex: "^a{1,3}$"

"": Not expected, not Python
"a": Expected, Python
"aa": Expected, Python
"aaa": Expected, Python
"aaaa": Not expected, not Python

Just what we'd expect. Now try the same thing with the plus in there.
I'm finding that none of the above strings yields a match. Maybe
there's something else being matched?

ChrisA



More information about the Python-list mailing list