What is built-in method sub

Diez B. Roggisch deets at nospam.web.de
Mon Jan 11 16:08:19 EST 2010


Philip Semanchuk schrieb:
> 
> On Jan 11, 2010, at 3:30 PM, Jeremy wrote:
> 
>> On Jan 11, 1:15 pm, "Diez B. Roggisch" <de... at nospam.web.de> wrote:
>>> Jeremy schrieb:
>>>
>>>> On Jan 11, 12:54 pm, Carl Banks <pavlovevide... at gmail.com> wrote:
>>>>> On Jan 11, 11:20 am, Jeremy <jlcon... at gmail.com> wrote:
>>>
>>>>>> I just profiled one of my Python scripts and discovered that >99% of
>>>>>> the time was spent in
>>>>>> {built-in method sub}
>>>>>> What is this function and is there a way to optimize it?
>>>>> I'm guessing this is re.sub (or, more likely, a method sub of an
>>>>> internal object that is called by re.sub).
>>>
>>>>> If all your script does is to make a bunch of regexp substitutions,
>>>>> then spending 99% of the time in this function might be reasonable.
>>>>> Optimize your regexps to improve performance.  (We can help you if you
>>>>> care to share any.)
>>>
>>>>> If my guess is wrong, you'll have to be more specific about what your
>>>>> sctipt does, and maybe share the profile printout or something.
>>>
>>>>> Carl Banks
>>>
>>>> Your guess is correct.  I had forgotten that I was using that
>>>> function.
>>>
>>>> I am using the re.sub command to remove trailing whitespace from lines
>>>> in a text file.  The commands I use are copied below.  If you have any
>>>> suggestions on how they could be improved, I would love to know.
>>>
>>>> Thanks,
>>>> Jeremy
>>>
>>>> lines = self._outfile.readlines()
>>>> self._outfile.close()
>>>
>>>> line = string.join(lines)
>>>
>>>> if self.removeWS:
>>>>     # Remove trailing white space on each line
>>>>     trailingPattern = '(\S*)\ +?\n'
>>>>     line = re.sub(trailingPattern, '\\1\n', line)
>>>
>>> line = line.rstrip()?
>>>
>>> Diez
>>
>> Yep.  I was trying to reinvent the wheel.  I just remove the trailing
>> whitespace before joining the lines.
> 
> I second the suggestion to use rstrip(), but for future reference you 
> should also check out the compile() function in the re module. You might 
> want to time the code above against a version using a compiled regex to 
> see how much difference it makes.

For his usecase, none. There is a caching build-in into re that will 
take care of this.

Diez



More information about the Python-list mailing list