doubling the number of tests, but not taking twice as long

Larry Martell larry.martell at gmail.com
Wed Jul 18 20:09:36 EDT 2018


On Wed, Jul 18, 2018 at 7:59 PM, MRAB <python at mrabarnett.plus.com> wrote:
> On 2018-07-18 22:40, Larry Martell wrote:
>>
>> On Tue, Jul 17, 2018 at 11:43 AM, Neil Cerutti <neilc at norwich.edu> wrote:
>>>
>>> On 2018-07-16, Larry Martell <larry.martell at gmail.com> wrote:
>>>>
>>>> I had some code that did this:
>>>>
>>>> meas_regex = '_M\d+_'
>>>> meas_re = re.compile(meas_regex)
>>>>
>>>> if meas_re.search(filename):
>>>>     stuff1()
>>>> else:
>>>>     stuff2()
>>>>
>>>> I then had to change it to this:
>>>>
>>>> if meas_re.search(filename):
>>>>     if 'MeasDisplay' in filename:
>>>>         stuff1a()
>>>>     else:
>>>>         stuff1()
>>>> else:
>>>>     if 'PatternFov' in filename:
>>>>         stuff2a()
>>>>    else:
>>>>         stuff2()
>>>>
>>>> This code needs to process many tens of 1000's of files, and it
>>>> runs often, so it needs to run very fast. Needless to say, my
>>>> change has made it take 2x as long. Can anyone see a way to
>>>> improve that?
>>>
>>>
>>> Can you expand/improve the regex pattern so you don't have rescan
>>> the string to check for the presence of MeasDisplay and
>>> PatternFov? In other words, since you're already using the giant,
>>> Swiss Army sledgehammer of the re module, go ahead and use enough
>>> features to cover your use case.
>>
>>
>> Yeah, that was my first thought, but I haven't been able to come up
>> with a regex that works.
>>
>> There are 4 cases I need to detect:
>>
>> case1 = 'spam_M123_eggs_MeasDisplay_sausage'
>> case2 = 'spam_M123_eggs_sausage_and_spam'
>> case3 = 'spam_spam_spam_PatternFov_eggs_sausage_and_spam'
>> case4 = 'spam_spam_spam_eggs_sausage_and_spam'
>>
>> I thought this regex would work:
>>
>> '(_M\d+_){0,1}.*?(MeasDisplay|PatternFOV){0,1}'
>>
>> And then I could look at the match objects and see which of the 4
>> cases it was. But try as I might, I could not get it to work. Any
>> regex gurus want to tell me what I am doing wrong here?
>>
> The trick to capturing both of the parts when they are both optional is to
> use a lookahead and make it optional:
>
> r'(?=.*?(_M\d+_))?(?=.*?(MeasDisplay|PatternFov))?'

Wow! Thanks so much. This works perfectly. I don't understand it, but
I will spend some time dissecting it and I will add another tool to my
arsenal.



More information about the Python-list mailing list