[Python-ideas] New pattern-matching library (was: str.split with multiple individual split characters)

geremy condra debatem1 at gmail.com
Tue Mar 1 20:53:26 CET 2011


On Tue, Mar 1, 2011 at 10:30 AM, Guido van Rossum <guido at python.org> wrote:
> On Tue, Mar 1, 2011 at 9:05 AM, Mike Meyer <mwm at mired.org> wrote:
>> On Tue, 1 Mar 2011 19:50:44 +1000
>> Nick Coghlan <ncoghlan at gmail.com> wrote:
>>
>>> On Tue, Mar 1, 2011 at 9:19 AM, Mike Meyer <mwm at mired.org> wrote:
>>> > I disagree. Fully general string pattern matching has a few
>>> > fundamental operations: sequence, alternation, and repetition.
>>>
>>> I agree that the fundamental operations are simple in principle.
>>>
>>> However, I still believe that the elaboration of those operations into
>>> fully general pattern matching is a complex combinatorial operation
>>> that is difficult to master. regex's certainly make it harder than it
>>> needs to be, but anything with similar expressive power is still going
>>> to be tricky to completely wrap your head around.
>>
>> True. But I think that the problem - if properly expressed - is like
>> the game of Go: a few simple rules that combine to produce a complex
>> system that is difficult to master. With regexp notation, what we've
>> got is more like 3d chess: multiple complex (just slightly different)
>> sets of operations that do more to obscure the underlying simple rules
>> than to help master the system.
>
> I'm not sure those are the right analogies (though they may not be all
> that wrong either). If you ask me there are two problems with regexps:
>
> (a) The notation is cryptic and error-prone, its use of \ conflicts
> with Python strings (using r'...' helps but is yet another gotcha),
> and the parser is primitive. Until your brain has learned to parse
> regexps, it will have a hard time understanding examples, which are
> often the key to solving programming problems. Somehow the regexp
> syntax is not "natural" for the text parsers we have in our brain --
> contrast this with Python's syntax, which was explicitly designed to
> go with the flow. Perhaps another problem is with composability -- if
> you know how to solve two simple problems using regexps, that doesn't
> mean your solutions can be combined to solve a combination of those
> problems.
>
> (b) There often isn't all that great of a match between the high-level
> goals of the user (e.g. "extract a list of email addresses from a
> file") and the available primitive operations. It's like writing an
> operating system for a Turing machine -- we have mathematical proof
> that it's possible, but that doesn't make it easy. The additional
> operations provided by modern, Perl-derived (which includes Python's
> re module) regexp notation are meant to help, but they just extend the
> basic premises of regexp notation, rather than providing a new,
> higher-level abstraction layer that is better matched to the way the
> typical user thinks about the problem.
>
> All in all I think it would be a good use of somebody's time to try
> and come up with something better. But it won't be easy.
>
> --
> --Guido van Rossum (python.org/~guido)

It's unfortunate that there isn't a good way to do this kind of
long-range work within the auspices of Python. I can imagine a number
of projects like this that fail to attract interest due to low
perceived chances of success and a dearth of community feedback.

Geremy Condra



More information about the Python-ideas mailing list