What is built-in method sub

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Mon Jan 11 15:34:22 EST 2010


On Mon, 11 Jan 2010 11:20:34 -0800, Jeremy wrote:

> I just profiled one of my Python scripts 

Well done! I'm not being sarcastic, or condescending, but you'd be AMAZED 
(or possibly not...) at how many people try to optimize their scripts 
*without* profiling, and end up trying to speed up parts of the code that 
don't matter while ignoring the actual bottlenecks.


> and discovered that >99% of the time was spent in
> 
> {built-in method sub}
> 
> What is this function

You don't give us enough information to answer with anything more than a 
guess. You know what is in your scripts, we don't. I can do this:


>>> sub
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'sub' is not defined


So it's not a built-in function. Nor do strings have a sub method. So I'm 
reduced to guessing. Based on your previous post, you're probably using 
regexes, so:

>>> import re
>>> type(re.sub)
<type 'function'>

Getting closer, but that's a function, not a method.

>>> type(re.compile("x").sub)
<type 'builtin_function_or_method'>


That's probably the best candidate: you're probably calling the sub 
method on a pre-compiled regular expression object.


As for the second part of your question:

>  and is there a way to optimize it?


I think you'll find that Python's regex engine is pretty much optimised 
as well as it can be, short of a major re-write. But to quote Jamie 
Zawinski:

    Some people, when confronted with a problem, think "I know, 
    I'll use regular expressions." Now they have two problems.


The best way to optimize regexes is to use them only when necessary. They 
are inherently an expensive operation, a mini-programming language of 
it's own. Naturally some regexes are more expensive than others: some can 
be *really* expensive, some are not.

If you can avoid regexes in favour of ordinary string methods, do so. In 
general, something like:

source.replace(target, new)

will potentially be much faster than:

regex = re.compile(target)
regex.sub(new, source)
# equivalent to re.sub(target, new, source)

(assuming of course that target is just a plain string with no regex 
specialness). If you're just cracking a peanut, you probably don't need 
the 30 lb sledgehammer of regular expressions.

Otherwise, we'd need to see the actual regexes that you are using in 
order to comment on how you might optimize them.



-- 
Steven



More information about the Python-list mailing list