union in Python

Gustavo Niemeyer niemeyer at conectiva.com
Mon Aug 18 17:54:20 EDT 2003


> When looking for a particular string, rather than a pattern, you are
> probably better off to use the find or index method of that string
> rather than re.
> 
> >>> 'abcdefg'.find('d')
> 3

Just out of curiosity, the preference for the find() method rather than
using SRE might not be so obvious. SRE has an internal optimization
which handles literals in a different way, avoiding some of the overhead
from the engine. Additionally, it includes an overlap algorithm which
speeds up the linear searching in many cases, specially with large data.

Here is a cooked example which highlights that fact:

-----------------------------
import time
import re

s1 = "x"*30+"a"
s2 = "x"*1000000+"a"
p1 = re.compile(s1)

start = time.time()
for i in range(1000):
    p1.search(s2)
    print time.time()-start

start = time.time()
for i in range(1000):
    s2.find(s1)
    print time.time()-start
-----------------------------

And here is a sample output:

% python test.py
45.2645740509
233.810258031

-- 
Gustavo Niemeyer
http://niemeyer.net





More information about the Python-list mailing list