[Python-ideas] str.find() and friends support a lists of inputs

Fri Apr 18 01:05:05 CEST 2014

On Fri, Apr 18, 2014 at 4:52 AM, Alex Rodrigues <lemiant at hotmail.com> wrote:
> Below is a quick test in iPython, intentionally bypassing the cache:
>
> In [1]: a = "a"*100+"b"
>
> In [2]: %timeit -n 1 -r 1 a.find('b')
> 1 loops, best of 1: 3.31 µs per loop
>
> In [3]: import re
>
> In [4]: %%timeit -n 1 -r 1 re.purge()
>    ...: re.search('[b]', 'a')
>    ...:
> 1 loops, best of 1: 132 µs per loop
>

I'm always dubious of micro-benchmarks, especially when caches have to
be deliberately bypassed. How does the time compare if you *don't*
purge the cache? After all, compiling an RE once and using it lots of
times is exactly how they're meant to be used. Yes, it would be
potentially cleaner to offer a list of strings to .find(); but maybe
reaching for a regex is the right thing to do. Last night I wanted to
rename a whole bunch of files thus: "DoYouWannaBuildASnowman.mkv" ->
"Frozen - Do You Wanna Build A Snowman.mkv". Constant text at the
beginning, then add a space before every capital letter. Heretical or
not, I went regex. :)

ChrisA