How to print all expressions that match a regular expression

Nobody nobody at nowhere.com
Sun Feb 7 00:09:40 EST 2010


On Sun, 07 Feb 2010 00:26:36 +0000, Steven D'Aprano wrote:

>> So there isn't such a routine just because some of the regular
>> expressions cannot be enumerated.

No. There isn't a routine because no-one has yet felt any need to write
one.

>> However, some of them can be
>> enumerated. I guess I have to write a function myself.
> 
> How do you expect to tell the ones that can be enumerated apart from 
> those that can't be?

The ones which use the '+', '*' and '{m,}' operators match an infinite
number of strings; the rest can only match a finite number (assuming POSIX
REs; Python also has +? and *?).

["Enumerate" isn't the correct word here. You can *enumerate* an
infinite set, in the sense that you could write a Python generator for
which any member will eventually be generated.]

The obvious implementation is to construct the NFA then "run" it.

If you know that the RE can only match finite strings (i.e. the graph is
acyclic), then you can enumerate them using depth-first traversal. If it
can match infinite strings (i.e. if there are any cycles in the graph),
then you would need to use either breadth-first traversal or
incrementally-bounded depth-first traversal.

> [Aside: Python regexes aren't Turing Complete. I'm not sure about Perl 
> regexes. Either way, this might actually be less difficult than the 
> Halting Problem, as in "amazingly difficult" rather than "impossible".]

"Regular expressions" aren't Turing complete; this is implicit in the
definition of "regular". The Chomsky hierarchy has four levels, with
higher levels require a more capable system to decide whether a string is
a member of the language defined by the grammar:

	grammar			decidable by

	regular			finite automaton
	context-free		pushdown automaton[1]
	context-sensitive	linear-bounded automaton[2]
	recursively-enumerable	Turing machine

However, any "regular expression" syntax which allows backreferences
(including the POSIX specification) isn't actually "regular" in the formal
sense (as it requires an infinite number of states), but context-free.

[1] pushdown automaton = finite automaton with a stack

[2] linear-bounded automaton = Turing machine, except that it's "tape" is
finite and proportional to the size of the input.

	http://en.wikipedia.org/wiki/Chomsky_hierarchy




More information about the Python-list mailing list