doing hundreds of re.subs efficiently on large strings

Thu Mar 27 10:29:12 EST 2003

Anders J. Munch wrote:

> "Alex Martelli" <aleax at aleax.it> wrote:
>> Yeah, but having to keep track of "one after the latest .end" in
>> the loop to accumulate the fragments is fussy and may be a bit
>> error-prone.
> 
> When you know a better way.  That replacements could be functions had
> escaped my attention; your solution is clearly better.  No
> measurements needed to realise that.

It's simpler/cleaner, and yes, that's more an issue of aesthetics than
of measurements.  Not sure what you mean by the .sub method "escaping
your attention" when it's right in the subject (I'm just using it with
the composite re, just like you're proposing to use .finditer), but if
you're speaking more generally, sure -- one can't measure an approach
one hasn't even thought of (that's why posting here is so useful, as
often people WILL propose lots of different approaches).

> And both solutions may or may not be better that the OPs original
> one-sub-at-a-time approach.  I'll leave it to the OP to find that out.
> After all, the OP can do the testing with the right regexps on the
> right data.  Synthetic benchmarks give synthetic results.

Definitely.  Moreover, one should start comparative measurements of
different approaches only when one KNOWS the simplest, cleanest one
doesn't yield satisfactory performance AND that the area in question
is on the application's performance bottleneck as shown by profiling.

Alex