regex searching hangs

Rob Renaud rpgnmets at aol.com
Sun Feb 8 16:30:13 EST 2004


fortepianissimo at yahoo.com.tw (Fortepianissimo) wrote in message news:<ef08387c.0402080729.621632ca at posting.google.com>...
> Could someone explains why the following code hangs (Python 2.3.3)?
> 
> --- CODE STARTS ---
> import re
> 
> p=re.compile(r'\S+(?:\.\S+)+\.com')
> 
> t='......................................'
> p.search(t)
> --- CODE ENDS ---
> 
> From the syntax and semantics of regex, the entire t should be
> consumed by the pattern '\S+(?:\.\S+)+' and the search() call should
> return None. Is search() doing a depth-first search and can't pulling
> itself out? It doesn't seem right to me, or I'm missing something?
> 
> BTW, I don't think any regex matching should ever hang in any
> circumstances - correct me if I'm wrong: is halting problem a problem
> here?
> 
> Thanks!


It doesn't halt, it just seems you found an exponential time search,
searching at length i seems to take about 1.6 times the time it takes
to search at length i-1.

import re
import time
p=re.compile(r'\S+(?:\.\S+)+\.com')
t = '.'
for i in range(100):
	start = time.time()
	p.search(t*i)
	print i, time.time() - start

How should you improve it?  I don't know, I can't even read the regexp
:)



More information about the Python-list mailing list