doubling the number of tests, but not taking twice as long

Wed Jul 18 17:40:44 EDT 2018

On Tue, Jul 17, 2018 at 11:43 AM, Neil Cerutti <neilc at norwich.edu> wrote:
> On 2018-07-16, Larry Martell <larry.martell at gmail.com> wrote:
>> I had some code that did this:
>>
>> meas_regex = '_M\d+_'
>> meas_re = re.compile(meas_regex)
>>
>> if meas_re.search(filename):
>>     stuff1()
>> else:
>>     stuff2()
>>
>> I then had to change it to this:
>>
>> if meas_re.search(filename):
>>     if 'MeasDisplay' in filename:
>>         stuff1a()
>>     else:
>>         stuff1()
>> else:
>>     if 'PatternFov' in filename:
>>         stuff2a()
>>    else:
>>         stuff2()
>>
>> This code needs to process many tens of 1000's of files, and it
>> runs often, so it needs to run very fast. Needless to say, my
>> change has made it take 2x as long. Can anyone see a way to
>> improve that?
>
> Can you expand/improve the regex pattern so you don't have rescan
> the string to check for the presence of MeasDisplay and
> PatternFov? In other words, since you're already using the giant,
> Swiss Army sledgehammer of the re module, go ahead and use enough
> features to cover your use case.

Yeah, that was my first thought, but I haven't been able to come up
with a regex that works.

There are 4 cases I need to detect:

case1 = 'spam_M123_eggs_MeasDisplay_sausage'
case2 = 'spam_M123_eggs_sausage_and_spam'
case3 = 'spam_spam_spam_PatternFov_eggs_sausage_and_spam'
case4 = 'spam_spam_spam_eggs_sausage_and_spam'

I thought this regex would work:

'(_M\d+_){0,1}.*?(MeasDisplay|PatternFOV){0,1}'

And then I could look at the match objects and see which of the 4
cases it was. But try as I might, I could not get it to work. Any
regex gurus want to tell me what I am doing wrong here?