+ in regular expression

Cameron Simpson cs at zip.com.au
Fri Oct 5 19:37:42 EDT 2012


On 05Oct2012 10:27, Evan Driscoll <driscoll at cs.wisc.edu> wrote:
| I can understand that you can create a grammar that excludes it. [...]
| Was it because such patterns often reveal a mistake?

For myself, I would consider that sufficient reason.

I've seen plenty of languages (C and shell, for example, though they
are not alone or egrarious) where a compiler can emit a syntax complaint
many lines from the actual coding mistake (in shell, an unclosed quote
or control construct is a common examplei; Python has the same issue
but mitigated by the indentation requirements which cut the occurence
down a lot).

Forbidding a common error by requiring a wordier workaround isn't
unreasonable.

| Because "\s{6}+" 
| has other meanings in different regex syntaxes and the designers didn't 
| want confusion?

I think Python REs are supposed to be Perl compatible; ISTR an opening
sentence to that effect...

| Because it was simpler to parse that way? Because the 
| "hey you recognize regular expressions by converting it to a finite 
| automaton" story is a lie in most real-world regex implementations (in 
| part because they're not actually regular expressions) and repeated 
| quantifiers cause problems with the parsing techniques that actually get 
| used?

There are certainly constructs that can cause an exponential amount
of backtracking is misused. One could make a case for discouragement
(though not a case for forbidding them).

Just my 2c,
-- 
Cameron Simpson <cs at zip.com.au>

The most annoying thing about being without my files after our disc crash was
discovering once again how widespread BLINK was on the web.



More information about the Python-list mailing list