Is the regular expression module written in C or Python?

Ulli Stein mennosimons at gmx.net
Tue Oct 8 08:28:53 EDT 2002


Richie Hindle wrote:

> Hi Ulli,
> 
>> >>> import re
>> >>> re.findall("\[(.*?)\]", "["+"x"*10000+"]")
>> Traceback (most recent call last):
>> 
>> If the part which .*? will match exceeds 9996 bytes python throws the
>> above exception. Having this bug, re renders itself unusable.
> 
> 'Unusable' is putting it a bit strong:
> 
>>>> import re
>>>> re.findall(r"\[([^\]]*)\]", "["+"x"*10000+"]")
> ['xxxxxxxxxx...
> 
> I could be wrong, but I believe the latter is more efficient - I've a
> feeling that the lookahead construct makes the RE potentially very slow
> (it may be an implementation issue).  Hopefully a passing RE expert
> will be along to support/correct me...?
> 

This way of replacing the lookahaed works only in cases where you have only 
one char to look ahaed for.

I tried very long without success in replacing the (.*?) part for a RE in 
which I am looking for "[- ... -]", "[+ ... +]", "[$ ... $]", and "[# ... 
#]". How would you replace the (.*?) for this RE?

Ulli



More information about the Python-list mailing list