Python speed and `pcre'

Tue Aug 31 11:51:04 EDT 1999

François Pinard writes:

> After having translated some code (not big, but not small) from Perl
> to Python, I discover it runs ten times slower.  I did not learn to
> profile yet, and I am not sure it would help, after having read that
> profiling gives call counts, but no elapsed time.  Is that right?

No.

> My intuition tells me that the amount of regular expression matching
> might provide an explanation.  Here is my set of _hypotheses_ (I'm
> not sure):
> 
> * Perl keeps compiled all /REGEXP/ not using string interpolation,
> * Python cache for compiled REGEXP is less virtuous than I imagined.

Just the last one used, I think.

[wants to cache the re.compiled regex, but keep the regex string near 
its use]

Here's one way of caching (note that I don't do anything like this - 
I just put all my patterns at the top and compile them right away. 
The intent of a regex is often better communicated through giving it 
a meaningful name, rather than staring at the regex itself):
---------------------------------------
import re

ccache = {}
def ccompile(regex):
  if not ccache.has_key(regex):
    ccache[regex] = re.compile(regex)
  return ccache[regex]

def cmatch(regex, s):
  return ccompile(regex).match(s)

def csearch(regex, s):
  return ccompile(regex).search(s)

s = "This little piggie had Spam, This little piggie had None"

mo = csearch('piggie', s)
print mo.group(), 'at', mo.start()
mo = csearch('piggie', s[mo.end():])
print mo.group(), 'at', mo.start()
--------------------------------------------
Now had I been using explicitly compiled regexes, I would not have 
had to slice s, since the compiled regex object lets you specify 
start and end (without a copy being made). Yes, I could add that to 
this code, but why bother?

- Gordon