+ in regular expression
Cameron Simpson
cs at zip.com.au
Fri Oct 5 19:37:42 EDT 2012
On 05Oct2012 10:27, Evan Driscoll <driscoll at cs.wisc.edu> wrote:
| I can understand that you can create a grammar that excludes it. [...]
| Was it because such patterns often reveal a mistake?
For myself, I would consider that sufficient reason.
I've seen plenty of languages (C and shell, for example, though they
are not alone or egrarious) where a compiler can emit a syntax complaint
many lines from the actual coding mistake (in shell, an unclosed quote
or control construct is a common examplei; Python has the same issue
but mitigated by the indentation requirements which cut the occurence
down a lot).
Forbidding a common error by requiring a wordier workaround isn't
unreasonable.
| Because "\s{6}+"
| has other meanings in different regex syntaxes and the designers didn't
| want confusion?
I think Python REs are supposed to be Perl compatible; ISTR an opening
sentence to that effect...
| Because it was simpler to parse that way? Because the
| "hey you recognize regular expressions by converting it to a finite
| automaton" story is a lie in most real-world regex implementations (in
| part because they're not actually regular expressions) and repeated
| quantifiers cause problems with the parsing techniques that actually get
| used?
There are certainly constructs that can cause an exponential amount
of backtracking is misused. One could make a case for discouragement
(though not a case for forbidding them).
Just my 2c,
--
Cameron Simpson <cs at zip.com.au>
The most annoying thing about being without my files after our disc crash was
discovering once again how widespread BLINK was on the web.
More information about the Python-list
mailing list