re bug

Klaus Neuner klaus_neuner82 at yahoo.de
Wed Oct 6 03:41:22 EDT 2004


Thanks to you all. I will try to answer some of the questions in your
answers.

(Michael Hoffman:)

> It might help if you told us what this is supposed to do in real terms...

I invented a kind of linguistic metalanguage. My program is an
interpreter for this language. It translates statements of the
linguistic metalanguage into cascades of transducers. Therefore, I
cannot answer questions like the following one:

(Thomas Rast:)

>   (?:                 # why a group?

There is a group, because the translation algorithm has introduced it.
It is hard to tell why the algorithm has decided to do so. Maybe the
group doesn't make sense here, but I think it will make sense in some
similar case. Even if I re-read all of my program now in order to
re-understand my translation algorithm, it would not make much sense
to try to explain why I designed it that way and not another. I would
need dozens of pages for this.

I have been testing my program now since months and on gigabytes of
data. It has not failed for a long time. It did never produce
unwellformed regular expressions. I was so naive to think that it
would be sufficient to make sure that all regular expressions be
wellformed. (Althoug I was also concerned with efficiency issues, of
course.) Then I saw that wellformedness was not sufficient and I was
afraid that I would have to change my algorithm.

Luckily, I may safely assume that any pair of a regex and a string
such that the regex cannot be matched on the string in a few seconds,
is a case that can be disregarded. Therefore the signal-module
solution is perfect for me.

(Gustavo Niemeyer:)

> Btw, the only reason that the OP's expression didn't got stuck
> in the first two test cases is because the expression matched.

I don't really understand what you mean. You can easily change str3
such that rx matches. Nevertheless, you will have to wait a long time
to get a result.



More information about the Python-list mailing list